Tamper-Resistant Arithmetic for Public-Key  Cryptography by Gaubatz, Gunnar
Worcester Polytechnic Institute
Digital WPI
Doctoral Dissertations (All Dissertations, All Years) Electronic Theses and Dissertations
2007-03-01
Tamper-Resistant Arithmetic for Public-Key
Cryptography
Gunnar Gaubatz
Worcester Polytechnic Institute
Follow this and additional works at: https://digitalcommons.wpi.edu/etd-dissertations
This dissertation is brought to you for free and open access by Digital WPI. It has been accepted for inclusion in Doctoral Dissertations (All
Dissertations, All Years) by an authorized administrator of Digital WPI. For more information, please contact wpi-etd@wpi.edu.
Repository Citation
Gaubatz, G. (2007). Tamper-Resistant Arithmetic for Public-Key Cryptography. Retrieved from https://digitalcommons.wpi.edu/etd-
dissertations/64
Tamper-Resistant Arithmetic for Public-Key
Cryptography
by
Gunnar Gaubatz
A Dissertation
Submitted to the Faculty
of the
Worcester Polytechnic Institute
In partial fulfillment of the requirements for the
Degree of Doctor of Philosophy
in
Computer Engineering
March, 2007
Approved:
Prof. Berk Sunar
ECE Department
Dissertation Advisor
Prof. Mark G. Karpovsky
ECE Department
Boston University
Prof. William J. Martin
Department of Mathematical
Sciences
Prof. Brian M. King
ECE Department
Prof. Fred J. Looft
ECE Department Head

For my parents.

Abstract
Cryptographic hardware has found many uses in ubiquitous and pervasive security de-
vices with a small form factor, e.g. SIM cards, smart cards, electronic security tokens,
and soon even RFIDs. With applications in banking, telecommunication, healthcare, e-
commerce and entertainment, these devices use cryptography to provide security services
like authentication, identification and confidentiality to the user.
However, the widespread adoption of these devices into the mass market, and the lack
of a physical security perimeter have increased the risk of theft, reverse engineering, and
cloning. Despite the use of strong cryptographic algorithms, these devices often succumb to
powerful side-channel attacks. These attacks provide a motivated third party with access to
the inner workings of the device and therefore the opportunity to circumvent the protection
of the cryptographic envelope. Apart from passive side-channel analysis, which has been the
subject of intense research for over a decade, active tampering attacks like fault analysis have
recently gained increased attention from the academic and industrial research community.
In this dissertation we address the question of how to protect cryptographic devices
against these kinds of attacks. More specifically, we focus our attention on public key
algorithms like elliptic curve cryptography and their underlying arithmetic structure. In our
research we address challenges such as the cost of implementation, the level of protection,
and the error model in an adversarial situation. The approaches that we investigate all
apply concepts from coding theory, in particular the theory of cyclic codes. This seems
intuitive, since both public key cryptography and cyclic codes share finite field arithmetic
as a common foundation.
The major contributions of our research are (a) a generalization of cyclic codes that
allow embedding of finite fields into redundant rings under a ring homomorphism, (b) a new
family of non-linear arithmetic residue codes with very high error detection probability, (c)
a set of new low-cost arithmetic primitives for optimal extension field arithmetic based on
robust codes, and (d) design techniques for tamper-resilient finite state machines.

Acknowledgements
First and foremost, I would like to thank my advisor and mentor Dr. Berk Sunar for
his tremendous support and guidance throughout the years. His patience and insights have
resulted in a fruitful collaboration that I am truly thankful for.
I would like to extend my sincere gratitude to my dissertation committee members, Drs.
Mark Karpovsky, Brian King and William Martin, whose valuable suggestions and inputs
have improved this dissertation measurably. Additionally, the generous support of the Na-
tional Science Foundation (under grants No. NSF-ANI-0112889 and NSF-ANI-0133297), of
Intel Corporation, as well as that from WPI’s Axel F. Backlin scholarship fund is gratefully
acknowledged.
As important as academic advice and funding is, the support that comes from the
people around us shall not be forgotten, either. Many thanks to my dear friends, col-
leagues and collaborators in the lab: Jens-Peter Kaps, Selc¸uk Baktır, Erdinc¸ O¨ztu¨rk, Deniz
Karakoyunlu, and Ghaith Hammouri; to my colleagues from the lab upstairs: Dave Holl
and Ben Woodacre; to the helpful and indispensable office staff: Cathy Emmerton, Brenda
McDonald, and Colleen Sweeney; and to the fearless leader of the ECE department and
general good sport: Dr. Fred Looft. Thank you all!
Words cannot express how thankful I am to my parents, Erwin and Inge Gaubatz, for
all their love and support throughout the years: my Dad inspired me to become an engineer,
showed me how things work, and how to fix them if they don’t; my Mom with her wonderful
big heart taught me responsibility, was always there for me, and let me go out into the world
when I had to. I am indebted to my sister Corinna, who is there for our parents when I
cannot. To my parents- and sisters-in-law, Monir, Khosro, Yasi and Kiana, for their love
and support: kheili mamnoon!
Most importantly, however, I would like to thank my wife and best friend Nastaran for
all her love, care and encouragement. She stuck with me through some hard times, gave me
motivation when I was discouraged, and was always close to me—regardless of the North
American continent that physically separated us for most of our graduate student life.
San Jose, CA, April 2007
Gunnar Gaubatz

Contents
Abstract i
Acknowledgements iii
Table of Contents v
List of Tables viii
List of Figures ix
List of Algorithms x
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Work 7
2.1 Homomorphic Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Robust Arithmetic Residue Codes . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Low Cost Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Tamper Resilient Control Structures . . . . . . . . . . . . . . . . . . . . . . 10
3 Side-channel Analysis: A Taxonomy 13
3.1 Passive Side-Channel Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Active Side-Channel Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 An Adversarial Error Model 19
4.1 Faults from an Attacker’s Perspective . . . . . . . . . . . . . . . . . . . . . 21
4.2 Fault Attack Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Abstract Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Cyclic Codes and Homomorphic Embedding 25
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.1 Separate Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1.2 Non-separate Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
v
5.2 Review of Relevant Coding Theory . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.1 Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.2 Arithmetic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.3 Cyclic Code-Based Homomorphic Embedding . . . . . . . . . . . . . 36
5.3 Cryptographically Significant Finite Fields . . . . . . . . . . . . . . . . . . . 37
5.4 A Generalization of Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . 39
5.5 Homomorphic Embedding in Rings . . . . . . . . . . . . . . . . . . . . . . . 41
5.5.1 Basic Scaled Embedding . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.5.2 Idempotent Scaled Embedding . . . . . . . . . . . . . . . . . . . . . 46
5.5.3 Error Correction Using Algorithm-Based Fault Tolerance . . . . . . 51
5.6 Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6 Robust Codes 53
6.1 Introduction to Robust Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Robust Arithmetic Residue Codes . . . . . . . . . . . . . . . . . . . . . . . 55
6.3 Robust Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.4 Robust Montgomery Multiplication . . . . . . . . . . . . . . . . . . . . . . . 63
6.5 Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7 Low Cost Techniques 69
7.1 Optimal Extension Fields and their Arithmetic . . . . . . . . . . . . . . . . 70
7.2 Robust Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.3 Suitable Codes for OEF Arithmetic . . . . . . . . . . . . . . . . . . . . . . . 73
7.4 Robust OEF Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.4.1 Initial Encoding and Detection Network . . . . . . . . . . . . . . . . 76
7.4.2 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.4.3 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.4.4 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.4.5 Multiplication by Scalar . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.4.6 Shifting of Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.4.7 Frobenius Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.4.8 Further Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.5 Software Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.6 Performance Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8 Tamper Resilient Control Structures 89
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.3 Preliminaries and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.3.1 Attacker Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.3.2 A Novel Effectiveness Metric: Attack Effort per Area . . . . . . . . . 98
8.4 Redundant State Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.5 Fault-Resilient Sequential Circuits . . . . . . . . . . . . . . . . . . . . . . . 102
vi
8.5.1 Synthesis Results and Overhead Analysis . . . . . . . . . . . . . . . 109
8.5.1.1 Area Overhead . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.5.1.2 Effectiveness and Scalability . . . . . . . . . . . . . . . . . 111
8.5.1.3 Critical Path Overhead . . . . . . . . . . . . . . . . . . . . 112
8.5.2 Influence of Control Logic Overhead on Total Circuit Area . . . . . 112
8.6 Selection of Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.6.1 Using Design Diversity to Counter CMF . . . . . . . . . . . . . . . . 116
8.6.2 Attack Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9 Conclusion and Outlook 121
Bibliography 127
Appendix 137
vii
List of Tables
5.1 Cyclic Codes with Prime Degree Irreducible Divisors . . . . . . . . . . . . . 38
6.1 Closest Prime Number Distances from 2k for Practical Values of k . . . . . 59
7.1 Complexity of Various Extension Field Operations . . . . . . . . . . . . . . 85
7.2 Overhead of Robust OEF Arithmetic for ECC Point Multiplication . . . . . 86
7.3 Probability of Missing an Error in Robust OEF Arithmetic . . . . . . . . . 86
8.1 Simplex State Encoding for the Modular Exponentiation Example . . . . . 105
8.2 State Transition Table of the Fault-Resilient FSM . . . . . . . . . . . . . . 106
8.3 Initial Analysis of Various EDC and NMR Schemes . . . . . . . . . . . . . . 110
8.4 Analysis of Detailed FSM Models . . . . . . . . . . . . . . . . . . . . . . . . 111
8.5 Data Path / Control Logic Area Comparison . . . . . . . . . . . . . . . . . 113
8.6 Total Overhead for Modular Exponentiation Circuit with FT Control . . . 113
8.7 Minimum Code Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.1 Factorizations of 2n + u . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.2 Factorizations of xn + x+ 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.3 Factorizations of xn + x2 + 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 138
viii
List of Figures
5.1 Two Strategies for Error Detection Under a Ring Homomorphism . . . . . . 29
6.1 Robust Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Robust Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.1 General Error Detection Architecture . . . . . . . . . . . . . . . . . . . . . . 75
7.2 Data-flow Graph for the Digit-Serial Computation of TM . . . . . . . . . . . 79
7.3 Data-Flow Graph for the Digit-Serial Computation of TF . . . . . . . . . . 83
8.1 State Transition Diagram for Montgomery Ladder . . . . . . . . . . . . . . 93
8.2 Non-redundant State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.3 Fault-resilient State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.4 Lengths of Various t-Resilient Error Detecting Codes . . . . . . . . . . . . . 115
ix
List of Algorithms
6.1 k-bit Digit-Serial FIOS Montgomery Multiplication . . . . . . . . . . . . . . 63
6.2 Robust Montgomery Multiplication . . . . . . . . . . . . . . . . . . . . . . . 67
8.1 Joye and Yen’s Montgomery Ladder Exponentiation [JY02] . . . . . . . . . 94
x
Chapter 1
Introduction
In the course of only a few decades the field of cryptography has changed from a
black art that only few people in the military and secret services knew about, to an
active and openly researched science that is one of the key ingredients of the modern
internet economy. Beyond e-commerce it also starts to affect more and more aspects
of the public sector, with some examples being the adoption of cryptographic smart
cards for health care providers, and digital signature enabled electronic passports.
The pervasive use of cryptography brings with it many new challenges that were
not foreseen only a few years back when information security was mainly server-
centric. Ubiquitous security devices such as smart cards and security tokens lack the
traditional physical security perimeter of a server-based infrastructure. The small
form factor of these devices facilitates theft and reverse engineering techniques, and
thus requires new protection mechanisms for the sensitive data that is stored on them.
The current generation of cryptographic algorithms and protocols is rarely chal-
lenged by the computational resources of an attacker. Key sizes of 128 bits and more
for symmetric schemes and matching sizes for public key schemes offer sufficiently
large security margins to withstand even huge leaps in cryptanalytical progress. The
real, tangible threat stems from side-channel attacks in which an attacker uses certain
1
2 CHAPTER 1. INTRODUCTION
implementation-specific physical phenomena of the device to break the cryptosystem,
rather than to attack the algorithm directly.
Over the last decade a large body of industrial as well as academic research has
been devoted to the study of side channel attacks. In terms of high-level classifica-
tion we need to distinguish between passive attacks (attacker confined to the role of
observer), and active attacks (attacker can manipulate the device). Most of the R&D
effort to date has been focused on the development of countermeasures against passive
attacks, although early on Boneh, DeMillo and Lipton [BDL97] and also Joye, Lenstra
and Quisquater[JLQ99] effectively demonstrated the strong need for the protection
of various public-key systems against active attacks. Biham and Shamir [BS97] and
later Piret and Quisquater [PQ03] demonstrated successful fault attacks against the
data paths of various block ciphers, while only recently attacks on the control logic
of block ciphers have been reported by Choukri and Tunstall [CT05].
Fault injection based attacks have potentially devastating consequences on the
security of embedded systems. While their uses have been well known among practi-
tioners in the field, i.e. hackers and security evalutation professionals, treatment from
the academic community was until recently quite sparse compared to passive attacks
such as timing or power analysis. Even then, most papers on the subject matter seem
to focus mainly on the invention of new attacks or modification of existing attacks
to various cryptosystems. The contributions that deal with countermeasures to these
attacks seem to be in the minority.
Designers of security-critical embedded systems are in a perpetual struggle to
devise new countermeasures against adversaries, while at the same time keeping a
bound on the cost of the implementation. Ever new attack vectors keep appearing
which are easy to implement and difficult to anticipate. It seems surprising that pas-
sive attacks are generally better studied in the open literature than active tampering
attacks, given that the latter have been successfully (ab-)used in real-world scenarios
1.1. MOTIVATION 3
such as to produce counterfeit smart-cards for accessing pay-TV services [AK96].
For manufacturers of devices prone to any sort of tampering, it seems prudent to
invest not only in physical countermeasures such as tamper-proof coating and envi-
ronmental sensors, but also in error detection mechanisms. After all, the methods
of fault-injection are manifold (photo-flash, laser, ion-beam, power-glitch, radiation,
etc.), while the outcome is always the perturbation of data. Sensors for detecting
tampering activity are certainly a useful tool for increasing the level of physical secu-
rity, but they can never protect against all possible attack-vectors. By this reasoning,
error detection is an essential building block for secure systems under adversarial
conditions.
The current research effort to prevent active fault attacks is based on two strategies
which complement each other. On one hand there are techniques to reduce side
channel leakage in case of a successful fault insertion [JY02]. The other strategy is
aimed at increasing the fault-tolerance of cryptographic devices. The work presented
in this dissertation falls into the latter category.
1.1 Motivation
The security of several public-key cryptosystems relies on the intractability of the
discrete logarithm problem (DLP) in large finite groups. Such groups may be large
multiplicative subgroups of a finite field K, or the group of points on an elliptic curve
defined over K. The most common choices for K are prime fields Fp and binary
extension fields F2k . In the literature these are often referred to as Galois Fields,
in honor of the french mathematician E´variste Galois, and denoted as GF(p) and
GF(2k).
Arithmetic in Galois Fields serves as the mathematical foundation not only of
public key cryptography, but also many concepts in coding theory. Indeed cryp-
4 CHAPTER 1. INTRODUCTION
tography and coding theory are two very closely related fields with a large amount
of overlap. Finite fields and rings are mathematical structures that are common to
both areas and have found applications in the definition of cyclic codes, public-key
and symmetric cryptosystems. While the applications of both areas can be broadly
characterized as communication-centric, in practice they have traditionally remained
separate. Stated simply, coding theory provides for reliable communication over a
noisy channel, while cryptography provides for secure communication over an ideal
channel.
By viewing the data path of a public key arithmetic operation as a noisy environ-
ment that may be disturbed by an attacker, we aim to apply concepts from coding
theory to this “computational channel” in the hope to provide for reliable arithmetic
operation. This is not unlike early techniques in fault tolerant computing that aimed
to protect the arithmetic unit of a space-flight computer against errors caused by
radiation, albeit more focussed on the specific properties of finite field arithmetic.
1.2 Dissertation Outline
This dissertation consists of four major chapters (5 through 8) that represent the
major results of the author’s research. It is organized as follows: In Chapter 2 we
will provide the reader with an overview of related work specific to each of the four
major chapters. In Chapter 3 we will give an introduction into common side-channel
analysis techniques and their classification. Chapter 4 defines the adversarial error
model that forms the basis of our research. Chapter 5 contains the results of our
research on homomorphic embedding techniques for finite fields. We show how these
techniques may be used to provide redundancy and error detection mechanisms, and
how they relate to the theory of cyclic codes. In Chapter 6 we consider a worst-case
scenario with respect to attacker capability and introduce robust arithmetic codes
1.2. DISSERTATION OUTLINE 5
which offer nearly perfect error detection properties. Since the strong error detection
properties of these codes come at the price of large overhead, we show in Chapter 7
how one may strike a balance between robustness and low overhead, based on a
special class of finite fields. The data path-centric view of the previous chapters is
complemented by Chapter 8 in which we present novel techniques for the protection
of control structures in cryptographic circuits.
6 CHAPTER 1. INTRODUCTION
Chapter 2
Related Work
2.1 Homomorphic Embedding
Early work on fault tolerant cryptography has either revolved around the use of sim-
ple parity prediction schemes or adapted traditional mechanisms like triple modular
redundancy (TMR) and time redundancy for determining the correct result in the pres-
ence of errors. It seems, however, that most of the current effort is concentrated on
concurrent error detection (CED) schemes for symmetric ciphers. Efforts to provide
error detection capabilities to public key schemes based upon finite field arithmetic
have so far only seen sporadic treatment. The prevailing approach is augmentation
of finite field multipliers over Fk2 with parity prediction capabilities [RH02, RMH04].
These techniques, however, do not make use of the rich mathematical structures
provided by finite rings and fields, which form the arithmetic foundation for many
cryptographic schemes. Furthermore, their error detecting capabilities are mostly
aimed at simple faults with only a single bit-flip, caused for example by single event
upsets (SEU) due to background radiation, but not multiple faults induced by an
intelligent attacker.
A strategy that has not been explored yet is the use of error detecting codes
7
8 CHAPTER 2. RELATED WORK
(EDC) with an arithmetic structure, specifically cyclic binary and arithmetic codes,
upon which our approach is based. The main difficulty is that our purpose is not
only to encode and decode data for transmission over a noisy channel. In viewing
the computation itself as a noisy channel, we aim to compute with encoded operands
while preserving arithmetic operations. We thus propose to embed finite field elements
into a larger ring via a suitable ring homomorphism, and to utilize the redundancy
for error detection purposes. Embedding techniques for finite fields have been used
before, e.g. for the implementation of efficient finite field multiplier architectures based
on redundant representation in cyclotomic rings [WHBG02]. The authors, however,
have not explored the usefulness of this redundancy for fault tolerance. Our work on
scaled embedding was motivated by earlier work on modulus scaling [Wal92, O¨SS04],
and the connection to coding theory. In [O¨SS04] a scaled modulus of special low
Hamming-weight form was used to enable low-complexity modular reduction, but the
redundancy was not used for error detection. By additionally scaling the operands
with a constant factor, the information is spread out across an extended range of bits.
This allows the detection of errors by simply dividing out that factor and checking
for the remainder to be zero.
The results of this chapter were originally presented at the Fault Diagnosis and
Tolerance in Cryptography Workshop in 2005 and published in [GS05].
2.2 Robust Arithmetic Residue Codes
During the early years of fault-tolerant computing, residue codes were proposed
[RG71] as a means for systematically encoding integer operands and checking their
arithmetic operations for errors. The check symbol in residue codes is computed as
the remainder of the operand (or its complement) with respect to the check modulus,
usually a prime. Several variations such as multi-residue codes were also introduced
2.3. LOW COST TECHNIQUES 9
early on, as well as non-systematic arithmetic codes such as AN codes1. Designed
for the purpose of detecting only sporadically occurring bit errors, their arithmetic
distance is limited to 2 or 3. Mandelbaum [Man67] introduced arithmetic codes with
larger distance properties, however, with an unattractively large amount of redun-
dancy.
Unfortunately, due to the linear nature of their encoding scheme, standard arith-
metic residue codes do not offer robustness properties, since any error pattern which
itself is a codeword can not be detected, irrespective of the actual data. In this work
we extend the idea of residue codes to non-linear residue codes and show how their
arithmetic properties can be used to provide strong error detection under even the
most challenging adversarial conditions. Our construction for robust arithmetic codes
is versatile enough to be applied to any fixed width data-path for digit serial general
purpose arithmetic. Using this code it is possible to protect any type of integer ring
or prime field arithmetic, e.g. RSA, Diffie-Hellman, Elliptic Curves over GF(p), etc.,
against active adversaries.
We presented our results at the Fault Diagnosis and Tolerance in Cryptography
Workshop in 2006 and published them in [GSK06].
2.3 Low Cost Techniques
A common approach to prevent errors in cryptographic hardware involves the use of
existing redundancy, for example by decrypting an encrypted result and comparing
to the input. In other cases simple concepts from classic linear coding theory (parity
prediction, Hamming codes) are applied to standard circuits, for low-cost protection
against basic faults.
However, with the advent of more advanced attack techniques and more accurate
1Named after their multiplicative encoding procedure N ′ = AN , where A is the code generator
value and N is an operand
10 CHAPTER 2. RELATED WORK
error and attacker models, these simple techniques may prove to be inadequate. Ef-
fective hardening of systems against sophisticated and malicious fault attacks requires
strong error detection under worst case assumptions rather than average case. At the
same time there is always the concern about overhead of error detection solutions,
since more complex circuits are also more likely to fail. The robust arithmetic residue
codes mentioned above, unfortunately fall into this category.
In order to strike a balance between robustness and overhead we show in this
chapter how Karpovsky and Taubin’s [KT04] original construction may be applied
to a type of specialized finite extension fields named optimal extension fields (OEF).
This new construction preserves the original code’s strong error detection properties,
yet with almost negligible overhead.
The results of this chapter are currently under preparation for publication.
2.4 Tamper Resilient Control Structures
So far most of the work in fault tolerant cryptography is centered around concurrent
error detection (CED) techniques applied to the data path [RMH04, GS05]. Other
techniques take advantage of architectural features, such as independent encryption
and decryption units to compute the inverse operation for parallel computation and
comparison of the result [JWK04]. While this strategy works well with symmetric key
algorithms such as the AES [BBK+02], it may not work in a public-key setting, for ex-
ample if the private key is not available to check the result of an encryption [ABF+02].
We are currently unaware of any strategies to also protect the control logic (state
machine, sequencer, execution pipeline, etc.) of cryptographic systems against in-
terference from the outside. A literature search revealed that there are several
decades worth of research on the design of fault-tolerant state machines in clas-
sic application domains like automation control, avionics, and space-borne systems
2.4. TAMPER RESILIENT CONTROL STRUCTURES 11
[Ren78, CT91, Ber04]. While it is true that many of those findings could also apply
to cryptographic systems, our main concern is that the fault model is fundamentally
different. Fault-tolerant digital systems are typically designed around the premise
that faults only occur one at a time and that there is sufficient time to recover be-
tween faults. Thus all that is needed to build a fault secure system is the ability to
recover from the set of single faults [Pra86]. Such an assumption seems reasonable as
long as the faults are caused by stochastic events like natural background radiation,
or the random failure of components over time. In a cryptographic setting we deal
with faults of an adversarial nature, caused by an intelligent attacker, who can be
assumed to know about the structure and thus certain weaknesses of the system. In
this chapter we therefore explore design techniques for tamper resilient finite state
machines that employ error detecting codes with large minimum distance.
Our results have been accepted for publication in the IEEE Transactions on Com-
puters [GSS07].
12 CHAPTER 2. RELATED WORK
Chapter 3
Side-channel Analysis: A
Taxonomy
The standard literature on cryptography defines a cryptographic system abstractly
as a black box with an input and an output. The input accepts the plain text, while
the output provides the cipher text. Implicitly we also shall assume that the key is
loaded into the black box during a safe time via some appropriate mechanism, and
stored for later use in the field, but cannot be retrieved out of the box. In the physical
world this description is insufficient, as it is nearly impossible to completely isolate
a device from its environment. In reality there are many unintentional input and
output channels, in the following named side-channels, which can reveal the inner
workings of the device.
A side-channel attack, or more formally a side-channel analysis, is an attack on a
cryptographic device that makes use of one or more of these covert channels to obtain
information about the key material, thus breaking the cryptosystem. Over the last
two decades, many different classes of side-channel attacks have been studied in the
literature and also applied in practice.
In general one may classify side-channel analysis into two major categories, de-
13
14 CHAPTER 3. SIDE-CHANNEL ANALYSIS: A TAXONOMY
pending on whether a device is simply observed or whether it is being acted upon by
the attacker. Observation-only attacks are termed passive, while attacks involving
interaction with the device are termed active. Just like the regular interface of a
cryptosystem defines input and output channels, side-channels may also be classified
in terms of the direction of information flow. Physically observable phenomena that
result from activity inside the device consitute a covert output channel. Similarly, any
kind of external manipulation that the device is exposed to and which has an effect on
the device’s inner workings can be considered a covert input channel. Moreover, an
external manipulation that has a discernible effect, i.e. that can be observed through
any (overt or covert) output channel, constitutes an interaction with the device.
3.1 Passive Side-Channel Analysis
The following physical phenomena have been identified in the literature as effective
for passive side-channel analysis:
• Timing information
• Power consumption
• Electromagnetic emissions (including thermal phenomena)
• Accoustic emissions
Timing analysis, first introduced by Kocher [Koc96] in 1996, works by measur-
ing the data-dependent differences in execution timing of na¨ıve implementations to
determine the Hamming weight of the secret key. It applies mainly to modular expo-
nentiation based public key cryptosystems, such as RSA or Diffie-Hellman, where the
exponentiation is implemented as a simple square-and-multiply algorithm. Common
countermeasures include multiply-always strategies aimed at constant time execution.
3.2. ACTIVE SIDE-CHANNEL ANALYSIS 15
Power [KJJ99] and electromagnetic analyses [GMO01, QS01] measure the power
consumption and the radio frequency spectrum of a cryptographic device over time
to deduce information about the data that is being processed, specifically the key-
dependent data. It can be successful even if precautions to timing analysis have been
taken. Simple power (SPA) and electromagnetic analysis (SEMA) operate on a single
power or EM trace acquisition, while differential analysis (DPA and DEMA) acquires
several thousands of traces and computed their average signal to filter out random
variations. Then, by computing the difference between the average power trace and a
single acquisition, even subtle data-dependent power consumption becomes visible in
the trace, making this attack very powerful. While power analysis may be mitigated
by supply power filtering at the terminals of the device, EM analysis can bypass
these countermeasures by sampling the emissions with high accuracy directly above
the circuit using micro-coils.
Common countermeasures against power and EM analysis require a tremendous
effort on the part of the circuit designers, who need to go to great lengths to bal-
ance power consumption or shield the device from emitting EM radiation. Other ap-
proaches use adaptive masking techniques for randomization and hence, de-correlation
between the power signature and the secret information.
3.2 Active Side-Channel Analysis
Active attacks are much more powerful than passive attacks, since they no longer
confine the attacker into the role of an observer. Through the deliberate insertion
of faults into the computation an adversary might cause the leakage of secret key
information. The consequences of not employing at least an error detection scheme
have been demonstrated vividly in [BDL97]. Protecting against this class of attacks
requires more than elaborate circuit tricks; it requires a mechanism for detecting mod-
16 CHAPTER 3. SIDE-CHANNEL ANALYSIS: A TAXONOMY
ification of data, faulty behavior of the arithmetic circuit, or both. Just as traditional
mission critical applications like avionics and systems working under harsh environ-
mental conditions require fault tolerant design techniques, it becomes increasingly
important for embedded security devices operating in a hostile environment.
Active side channel attacks rely on the manifestation of injected faults as erroneous
results which can then be observed either at the output of the device, or through
some covert side-channel. The error is therefore the difference between the expected
output x and the observed output x˜ = x+ e. We will define the error model and the
capabilities of the adversary more precisely in the following chapter.
In 1996 Boneh, DeMillo and Lipton [BDL97] demonstrated painfully how vulner-
able straightforward implementations of public-key cryptographic algorithms are to
a class of attacks now commonly referred to as “Bellcore attacks”. Here we give a
short summary of the attack as it applies to the RSA signature scheme.
It is commonly known that the Chinese Remainder Theorem (CRT) may be used
to speed up the computation of the RSA signature algorithm, due to availability
of the factors p and q of the RSA modulus N . Let n = dlog2Ne be the length
of the modulus in bits, and thus typically p and q are of size n/2 bits. The RSA
signature algorithm computes the modular exponentiation y = xd mod N , where x is
the message digest, d is the private key, and y is the resulting signature. Instead of
computing a n-bit exponentiation which has a time complexity of O(n3) assuming a
quadratic complexity multiplication algorithm, the CRT allows one to compute two
n/2-bit exponentiations yp = x
dp mod p and yq = x
dq mod q and combining them
into the final result y = yq + q((yp − yq)q−1 mod p). This reduces the complexity of
exponentiation to O((n/2)3) and results in an overall speed-up of almost factor 4.
The Bellcore attack consists of obtaining one correct signature S and one faulty
one S ′ by inducing a random fault in one of the two n/2-bit exponentiations. The only
requirement for success is that the fault only affects either yp or yq but not both. After
3.2. ACTIVE SIDE-CHANNEL ANALYSIS 17
obtaing S and S ′, the attacker may simply compute the greatest common divisor of
the signatures’ difference and the RSA modulus N to obtain the factorization of N :
gcd(S − S ′, N) = r, where either r = p or r = q with high probability.
Shortly after their findings were discussed in the cryptographic community, Shamir
[Sha99] proposed a simple and “low-cost” countermeasure which consists of scaling
the moduli p, q by some small random factor j and a slightly modified derivation of
the partial exponents. These modifications of the standard CRT algorithm allow the
detection of random errors during the exponentiation. Shamir’s trick, however, was
shown to be flawed by Aumu¨ller et al. [ABF+02], since it does not protect all steps of
the computation. More advanced protection schemes were proposed by Blo¨mer et al.
[BOS03] and Yen et al. [YJ00], and there exist claims that some of them can be broken,
too [Wag04], although this seems to be disputed. The discoveries of the Bellcore team
triggered additional work by other researchers who extended the technique to other
cryptosystems [JLQ99], and also spawned a new type of active attack on symmetric
key cryptosystems such as block and stream ciphers, called differential fault attack
(DFA) [BS97].
Apart from Bellcore style attacks there exists another type of fault attack, which is
aimed at common countermeasures to passive attacks. In order to prevent power and
electro-magnetic analysis techniques, many VLSI implementations nowadays employ
power balanced logic gate libraries, whose power consumption and hence electro-
magnetic emanations are data-independent. New fault attacks are aimed at intro-
ducing glitches into the circuit which cause such gates to ‘lose balance’, i.e. reveal
data through power imbalances. This opens the door to various classical attacks
on the circuit, like simple and differential power (SPA,DPA) and electromagnetic
(SEMA,DEMA) analysis. All this demonstrates the urgent need for a truly robust
error detection scheme.
18 CHAPTER 3. SIDE-CHANNEL ANALYSIS: A TAXONOMY
Chapter 4
An Adversarial Error Model
Based on the discussion of active tampering attacks we will now establish an adver-
sarial error model which allows us to analyze the effectiveness of countermeasures.
We will make use of existing definitions borrowed from the testing and fault-tolerance
communities.
Generally one can say that errors are the manifestations of faults occurring in a
system. Some classes of faults may not cause an error, others may have devastating
consequences and cause multiple errors at the same time. Faults can be broadly
categorized into permanent and transient faults:
Permanent Faults can be caused by manufacturing defects, long-term slowly de-
structive effects such as electromigration, or by a sudden disruptive force, for
example a static discharge. In an adversarial setting, a permanent fault may be
caused by an attacker who has access to a focussed ion beam (FIB) workstation,
intent on manipulating the functionality of a device.
Transient Faults are non-permanent faults, which have an effect only for a lim-
ited amount of time, usually only a couple of clock-cycles or fractions thereof.
These may be caused by environmental effects that temporarily drive the op-
erating conditions of the device outside of the normal range for which it was
19
20 CHAPTER 4. AN ADVERSARIAL ERROR MODEL
designed, such as fluctuations in supply voltage or temperature. As before, we
are more interested in adversarial faults, which may be induced in a variety
of ways. An attacker may try to emulate environmental effects and raise the
operating temperature of the device or manipulate the power supply. Other
specific techniques are described further below.
Successful induction of faults depends on a variety of factors, such as the physical
characteristics of the device under attack, the method of fault induction and the
existence and type of countermeasures. Starting with the latter, the presence of
countermeasures determines which methods of fault injection can be applied in the
first place. For example, if the device has a unit for monitoring the quality of the
supply voltage and can perform an emergency shutdown if necessary, then a glitching
attack is unlikely to succeed.
Every method of fault injection has different characteristics in terms of location
and effect of the fault. Some techniques such as power supply glitching may have a
more global effect, for example in the case where there is only a single power connector.
In other cases such as a general purpose CPU, there may be several ground and supply
voltage connectors (GND and VCC), each of which may supply power to a particular
functional group of the device. With a finer granularity of power supply lines the
attacker also gains better control over the locality of the fault.
The precision of fault injection can be characterized in terms of spatial and tempo-
ral resolution of faults, which ultimately determines the precision with which specific
error patterns can be caused in the device:
Spatial resolution is a measure of precision with regard to the physical dimensions
of the device.
Temporal resolution measures the precision with regard to fault insertion at a
particular time.
4.1. FAULTS FROM AN ATTACKER’S PERSPECTIVE 21
In addition to the classification of faults based on resolution, we may additionally
distinguish between two modes of failure that we define as follows: A common-mode
failure (CMF) [MM00] is a set of multiple, simultaneously occurring faults resulting
from a single cause, e.g. due to a glitch in the power supply, or a large area ionization
of an unpackaged integrated circuit with a radiation source. Highly regular circuits
made from replicated components are the most prone to common-mode failure, if the
cause of the fault is not one of limited locality. In contrast, a single-mode failure
(SMF) is a single fault that occurs due to a single cause. Oftentimes the effect of
the fault is locally limited. Examples of this type of fault are single stuck-at faults
caused early on by manufacturing flaws or later through breakdown resulting from
electro-migration. In an adversarial setting, a transient single-mode failure may also
be induced by means of a ‘light-attack’. This refers to the introduction of single bit
errors into an unpackaged integrated circuit by means of a highly focused laser beam
or other ionization source [SA02]. It requires a high degree of spatial and potentially
also temporal precision.
4.1 Faults from an Attacker’s Perspective
From an attacker’s point of view transient faults are more useful than permanent
faults for many reasons. For one, he may only possess a single device to attack and
a permanent fault may render the device non-functional. Secondly, several of the
known fault attacks require faulty and non-faulty output in order to succeed. In the
case of permanent faults there is only a single opportunity to obtain useful output
on which to mount the attack. The infamous fault attack on the CRT-based RSA
signature scheme by Boneh et al. [BDL97], for example, assumes a simple register
fault model, in which an almost arbitrary portion of an intermediate value (possibly
a single bit) is changed during one of the two modular exponentiations. It does not
22 CHAPTER 4. AN ADVERSARIAL ERROR MODEL
matter which of the two exponentiations is affected, as long as it is only one of them
and not both.
4.2 Fault Attack Categories
Adversarial faults may be introduced into the cryptosystem in a variety of ways. The
spatial resolution with which an attacker is able to carry out the procedure depends
largly on the type of fault attack, which may be from one of the following three
categories.
Non-invasive Fault Induction In a non-invasive setting the mechanical integrity
of the device under attack and its shell is unharmed. This means essentially that
no attempt has been made to remove any protective coating of the chip, neither the
regular IC packaging, nor an additional layer of tamper-proof coating. Any method
of fault induction under this category would be limited to either the device’s electrical
terminals or electromagnetic radiation.
Invasive Fault Induction When the device’s mechanical integrity has been sev-
ered, e.g. for modification of the electric circuit or circumvention of anti-tampering
countermeasures, we speak of an invasive attack. This method gives the attacker
the highest level of control over the fault insertion, but it also carries the highest
risk of accidentally erasing key material in the process, for example by triggering a
countermeasure such as those mentioned in the beginning of this section.
Semi-invasive Fault Induction The hybrid approach of removing the packaging
material of the chip but keeping the circuit intact is characteristic of a semi-invasive
attack. With access to the chip’s surface, an attacker may use optical ionization
sources to trigger transient faults in the device with high spatial resolution.
4.3. ABSTRACT ERROR MODEL 23
Advanced attacks have been demonstrated under laboratory conditions, e.g. in
[BOS03] where an electric spike generator was used to influence the power supply
with precise control. Also, Skorobogatov reported on semi-invasive techniques such
as optical fault inductions [SA02], where a high-energy light source, e.g. a camera
flash or a laser pointer, were used to induce single bit-flips in SRAM cells of a de-
packaged micro-controller. These advanced attacks do not require a great deal of
sophistication or expense, yet are very powerful.
4.3 Abstract Error Model
Our aim is to abstract the effect of a fault induction (the error) from its method,
in order to obtain an analytical model, which we can then use to derive effective
countermeasures. We thus model a successful fault induction as the additive distortion
of data or logic values in the device.
Depending on the concrete function of the area of the device which is affected
by the fault, there are two principle ways to characterize the error. A logical error
is a bitwise distortion of the data, usually modeled as the exclusive OR of data and
error, i.e. x˜ = x ⊕ ex, while arithmetical errors admit the propagation of carries up
to the range limit: x˜ = x + e mod 2k, where k is the width of the data path. The
former is appropriate for storage dominated devices (register files, RAM, flip-flops,
etc.), the arithmetic error model is more useful for arithmetic circuits such as adders
and multipliers.
The multiplicity of an error is an important parameter with consequences for the
performance of error detection strategies. In information theory the weight of a code
word denotes the number of non-zero symbols. Similarly, the distance between two
code words denotes the number of symbols in which the two values differ.
24 CHAPTER 4. AN ADVERSARIAL ERROR MODEL
Logical Errors can be modeled in exactly the same way that transmission errors
in basic coding theory are modeled. The weight of an error term ex is its Hamming
weight (short: HW(ex)) and the distance between two codewords x, y is the Hamming
weight of their difference HW(x− y).
Arithmetical Errors are modeled slightly different, due to the influence of carry
propagation. The binary arithmetic weight (BAW) of an integer N is defined as
the number of non-zero coefficients in the minimal non-adjacent representation of
N =
∑k
i ni2
i with ni ∈ {−1, 0, 1}.
Due to the arithmetic structure of public-key cryptographic primitives addressed
in this dissertation, we model all data-path errors as arithmetical errors and all errors
on the control logic as logical errors. As a consequence of the adversarial nature of
fault attacks, our error model is not limited to any error patterns of a specific weight,
nor does it assume certain errors to occur more frequently than others.
Chapter 5
Cyclic Codes and Homomorphic
Embedding
In this chapter we introduce a technique for fault-tolerant finite field arithmetic with
applications to public-key cryptosystems defined over finite abelian groups (usually
the group of points on an elliptic curve or the multiplicative subgroup of a finite field).
Since elliptic curve based systems are also defined over finite fields, fault-tolerant finite
field arithmetic is therefore at the heart of tamper-resistant public-key cryptography.
We propose scaled embedding as a new approach to fault tolerant finite field arith-
metic that is based in principal on two important classes of linear codes, which are
arithmetic codes and binary cyclic codes. The usefulness of these codes for our pur-
poses stems from the fact that they possess the same basic arithmetic structure as
most public key cryptographic schemes. Public-key arithmetic is commonly based on
integer and binary polynomial rings and fields, most notably in Elliptic Curve Cryp-
tography. The theories of both classes of codes contain a significant amount of overlap,
which suggests a unified treatment. It is possible and useful to view the encoding of
operands along with their arithmetic operations as a ring homomorphism.
Under a ring homomorphism elements from the non-redundant ring or field F
25
26 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
can be embeddded into a larger, redundant ring R by means of multiplication with
a constant scaling factor. While the techniques that we introduce in the following
sections may not always adhere to the strict definition of cyclic codes, a less stringent
definition allows a more flexible choice of scaling factors. When robustness with
respect to minimum distance is required, one can always fall back on the special case
of cyclic codes, for which a designed distance metric can be calculated.
In our scheme, arithmetic operations executed in R preserve the operations that
otherwise would have to be executed without redundancy in F . This redundancy
can be utilized in every step of the computation to detect errors caused either by
transient faults due to circuit crosstalk and radiation, or by malicious fault insertion
from an adversary. Our method serves two mutually beneficial purposes: on one
hand the redundancy of the larger ring can be used for error detection, which is our
main objective. On the other hand, the particular low-weight representation of the
ring modulus lends itself particularly well to the efficient direct method for modular
reduction, based on shifts and additions / subtractions. The negative performance
impact of the larger encoded operands is therefore mitigated to some degree.
Our method constitutes a generalization of classic cyclic codes for fault-tolerant
arithmetic. This generalization is what enables us to obtain a larger choice of code pa-
rameters, thereby allowing hardware implementations with flexible time-space trade-
offs.
5.1 Introduction
Finite field arithmetic operations such as addition and multiplication form the foun-
dations on which most cryptosystems are built. Our goal in this chapter is to define
a redundant representation of a finite field along with its associated arithmetic oper-
ations, which (a) preserves the original field operation and (b) allows the detection of
5.1. INTRODUCTION 27
errors which may arise due to either random or malicious insertion of faults into the
computation.
Our intention is to devise an error detection scheme which utilizes the rich math-
ematical structure of finite fields and rings. For this we want to use a transformation
through which we introduce redundancy in our representation and thereby gain error
detection capabilities. We would like to be able to map from a ring or field F to a
redundant ring G, while preserving the arithmetic. Here F may be, for example, the
ring Zm of integers modulo m. Formally the elements of Zm are cosets of the form
a+mZ, but it is usual and customary to simply refer to them as a, where 0 ≤ a < m.
We need a transformation function φ : F → G which maps between the additive
identities and preserves the addition and multiplication operations in the rings (or
fields) F and G, and thus forms a ring homomorphism:
Definition 5.1.1. Let F and G be two finite rings. A ring homomorphism is a
function φ : F → G, which satisfies the following properties:
1. Addition is preserved: φ(a+ b) = φ(a) + φ(b), for all a, b ∈ F
2. Multiplication is preserved: φ(ab) = φ(a)φ(b), for all a, b ∈ F
Example 5.1.1. A function φ : Z16 → Z20 becomes a ring homomorphism via
a 7→ 5a.
There are two potential strategies to achieve fault tolerant arithmetic using ring
homomorphisms. They are based on the difference between the two major classes
of codes: separable and non-separable. The former ones are also called systematic
codes, since each codeword can be split into an information part (the original data
in unchanged format) and a redundant part which contains extra digits, the check-
symbols. For non-separable codes the roles of information- and check-symbols are
not clearly defined, since encoding procedures tend to ‘spread’ the information over
28 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
multiple digits in the resulting codeword. We differentiate between the following two
principal strategies:
• A homomorphism φ which is not one-to-one may be used to create an additional
verifier datapath besides the original main data path. It mimics all computa-
tions on the main data path using the homomorphic images of the operands
obtained through the encoding φ(·) (cf. Fig. 5.1(a)). At the end of the opera-
tion the image φ(c) of the result from the regular main data path is compared to
the output of the verifier data path. This strategy is somewhat similar to that
used in parity prediction circuits, e.g. in [RH02], although a bit more general.
• If φ is one-to-one (i.e. φ(a) 6= φ(b) whenever a 6= b) then we speak of embedding
F into G, and the difference in cardinality establishes the amount of redundancy
present in the embedding. After mapping all operands from F to G via the
homomorphism φ(·), all computation is carried out in G. The integrity of the
mapping is verified at intermittent check-points between operations. At the
very end of the computation the result is converted back to F via the decoding
function φ−1(·) (cf. Fig. 5.1(b)), which exists when φ is one-to-one.
5.1.1 Separate Codes
The main idea of this approach is to have two individual data paths perform equivalent
operations on different, yet related, sets of data. The original data path is unchanged
from the non-redundant version, while the redundant data path may be smaller,
operating on images of the original data, the check-symbols, which are usually not
unique. The smaller bitlength representation of the check-symbols implies that the
mapping function φ(·) is not one-to-one, and |F| > |G|. If φ is a ring homomorphism,
it preserves the arithmetic of the original data in the image. We can thus compare
5.1. INTRODUCTION 29
φ(Α), φ(Β)
Datapath Datapath
A,B ∈F ∈G
Co
m
pu
ta
tio
n
C = AB
φ(.)
φ(.)
?φ(ΑΒ)
φ(Α)φ(Β)
Comparison
Co
m
pu
ta
tio
n
Main Verifier
(a) Separate Encoding
φ (.)−1
φ(Α), φ(Β)φ(.)
Non−Redundant
Representation
Redundant
Encoding
A,B ∈F ∈G
Co
m
pu
ta
tio
n
C = AB φ(ΑΒ)=φ(Α)φ(Β)
(b) Homomorphic Embedding
Figure 5.1: Two Strategies for Error Detection Under a Ring Homomorphism
the resulting image φ(a)◦φ(b) to the image of the result of the operation φ(a◦ b) and
check for errors.
The following example illustrates this approach.
Example 5.1.2. Let F = GF(p)[x]/(x2−1) the polynomial quotient ring with a(x) =
a1x + a0 ∈ F an element. Let further G = GF(p). The mapping φ : F → G
becomes a ring homomorphism via polynomial evaluation at x = 1, i.e. φ(a(x)) =
a(1) = a1 + a0 mod p, since it satisfies the properties in Definition 5.1.1. It preserves
arithmetic operations on two elements a = a1x+a0 and b = b1x+b0, such as addition
φ(a+ b) = φ((a1 + b1)x+ (a0 + b0))
= (a1 + a0) + (b1 + b0)
= φ(a) + φ(b)
30 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
and multiplication
φ(ab) = φ(a1b1x
2 + (a0b1 + a1b0)x+ a0b0)
≡ φ((a0b1 + a1b0)x+ (a1b1 + a0b0)) (mod x2 − 1)
= a0b1 + a1b0 + a1b1 + a0b0
= (a1 + a0)(b1 + b0)
= φ(a)φ(b) .
An implementation of this strategy would require only the addition of a small
secondary arithmetic data path to the existing primary ALU. Another example for
this strategy is the use of arithmetic residue codes. These are popular in traditional
fault-tolerant systems design and detect any single arithmetic error ±2i.
Although this approach appears interesting, its merits are limited. While residue
codes work nicely for regular integer arithmetic, the finite field and ring arithmetic
of most public-key cryptosystems introduces additional constraints which are hard
to overcome. For example, the natural mapping from a polynomial quotient ring
to its coefficient ground field that works so nicely in Example 5.1.2 is the exception
rather than the norm (notice that we evaluated at a root of the modulus x2 − 1).
Cryptographic fields are either large prime fields or binary extension fields of prime
degree, for which either no such natural subfield mappings exist1, or their usefulness
is severely limited.
5.1.2 Non-separate Codes
This leads us to an alternative approach to defining codes for fault tolerant arithmetic,
which is again based on homomorphic mapping. This time, however, the ring F is
mapped into a larger ring G, and arithmetic is only performed in G instead of in both
1With the notable exception of optimal extension fields (OEF).
5.1. INTRODUCTION 31
rings in parallel (cf. Fig. 5.1(b)). We speak of embedding F into G and using the
additional redundancy for error detection.
One may be tempted to exploit the natural embedding provided by the field / sub-
field relationship of field extensions. For instance, we may carry out arithmetic in a
finite field F = GF(qk) by embedding all operands into an extension G = GF((qk)m).
This method has the advantage of carrying the same field operations, i.e. addition,
multiplication, inversion. To construct such an embedding we need a function map-
ping elements from F to G, i.e. an injective map φ : F 7→ G. However, having
both the domain and the range of the mapping to be fields may prove to be too
restrictive, as the smallest extension would already double the size of the represen-
tation. As an alternative we may define the mapping from a field F = GF(qk) to a
ring R = GF[x]/r(x), with deg(r(x)) > k, which offers more flexibility in choosing
suitable parameters.
As described in the previous section the mapping function φ(·) needs to represent
a homomorphism and preserve the addition and multiplication operations. Inciden-
tally, this is the same principle behind cyclic codes, which also possess the arithmetic
structure of finite rings. Unfortunately, the precise definition of a cyclic code in the
literature is relatively restrictive, which makes it hard to find codes that allow the
embedding of cryptographically significant prime degree extension fields. By chang-
ing the definition slightly we can find a similar class of codes with a much larger
set of parameters, in which we can embed these fields. But before we talk about
these changes in definition, we would like to start with a brief review of cyclic codes.
Also, we will talk about cyclic arithmetic codes which represent the integer analogy
to cyclic codes.
32 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
5.2 Review of Relevant Coding Theory
In coding theory the alphabet over which codes are defined is typically chosen as a
finite field Fq = GF(q). The code itself is a subset or, in the case of linear [n, k]q
codes, a k-dimensional subspace of the n-dimensional vector space Fnq . Codewords
are usually represented as ordered n-tuples c = (cn−1, cn−2, . . . , c0), with ci ∈ GF(q).
5.2.1 Cyclic Codes
Cyclic codes are a family of linear codes with a rich algebraic structure, which is
tremendously helpful in their analysis. It is therefore customary to associate with each
codeword c = (cn−1, cn−2, . . . , c0) the code polynomial c(x) = cn−1xn−1 + cn−2xn−2 +
. . .+ c0 of degree less than n.
The distinct property of cyclic codes that sets them apart from other linear codes
is their invariance under cyclic shifts.
Definition 5.2.1. The cyclic shift c′ = c ª 1 of the codeword c by one position to the
left denotes the q-ary n-tuple (cn−2, cn−3, . . . , c0, cn−1) with its associated polynomial
c′(x) = cn−2xn−1 + cn−3xn−2 + . . .+ c0x+ cn−1. A linear code C is a cyclic code if for
any codeword c in C, c′ = c ª 1 is also in C.
It is easy to see that the polynomial c′(x) is equivalent to the product xc(x)
mod xn − 1, thus establishing the correspondence
c ª 1←→ c(x) 7→ xc(x) mod xn − 1 .
The cyclic shift c′(x) = c(x) ª i by i positions is therefore equivalent to computing
the remainder modulo xn−1 of the product xic(x) : c′(x) = cn−i−1xn−1+cn−i−2xn−2+
. . .+ cn−i+1x+ cn−i. Due to the finite length n of codewords it is straightforward to
see that a right cyclic shift can be expressed as a series of n− 1 left cyclic shifts.
5.2. REVIEW OF RELEVANT CODING THEORY 33
By linearity, any scalar multiple of a code polynomial ac(x), where a ∈ Fq, is
also in C. Accordingly, any linear combination of scalar multiples and cyclic shifts of
codewords, again yields a codeword in C, e.g.
(a0 + a1x+ a2x
2)c(x) = a0c(x) + a1xc(x) + a2x(xc(x)) .
is a codeword (when reduced modulo xn− 1) whenever c(x) is. This provides us with
a one-to-one correspondence between cyclic codes of length n over Fq and the ideals
I of the polynomial quotient ring Fq[x]/(xn − 1), which is essentially the well-known
theorem first recognized by Eugene Prange in 1957 [Pra57].
In order to find cyclic codes, we briefly review an algebraic formulation of the
problem. Using this formalism, cyclic codes of length n can be constructed by finding
the monic divisors of xn−1 over GF(q). Specifically we need to construct a generator
polynomial g(x) of degree r, which will be used to encode data polynomials of degree
less than k = n− r by means of regular polynomial multiplication.
Definition 5.2.2. Let d1(x), d2(x), . . . , d`(x) be the irreducible monic divisors of x
n−
1 over GF(q). Any product of these divisors g(x) =
∏`
i=0 di(x)
ci , ci = {0, 1} can serve
as a generator polynomial, since the only requirement for g(x) is that it be a divisor
of xn − 1 itself.
For cyclic codes, the encoding process is easy. For a given generator polynomial
of degree r = n − k and a data polynomial a(x) of degree less than k we obtain
its codeword as c(x) = a(x)g(x) mod xn − 1. For example, let q = 2, n = 7: the
irreducible monic divisors of x7 − 1 over GF(2) are
d1 = x+ 1
d2 = x
3 + x+ 1
d3 = x
3 + x2 + 1 ,
34 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
and thus for g(x) = d1d3 = x
4+x2+x+1, the ideal C = {a(x)g(x) : a(x) ∈ F2[x]} is
the [7,3]-Simplex code (the dual of the [7,4]-Hamming code) as a set of polynomials
in F2[x]/(x7 − 1).
5.2.2 Arithmetic Codes
While linear cyclic codes were developed principally for the transmission of data,
arithmetic codes were developed as a means to protect the arithmetic unit (AU) of
mission critical computers, most notably the STAR computer at JPL [Avi67].
Non-separate arithmetic codes, often also referred to as AN codes, are conceptu-
ally similar to cyclic codes in many aspects, although there are some fundamental
differences. Both types of codes form an ideal in a ring, but for arithmetic codes this
ring is the integers modulo m, Zm, as opposed to the previously mentioned polyno-
mial quotient ring for non-arithmetic cyclic codes. Codewords of both types may be
represented as n-tuples, but must be interpreted differently due to the difference in
arithmetic structure. Specifically, codewords of a cyclic code are n-tuples over the
finite alphabet Fq that may be represented as polynomials, but the codewords of an
arithmetic code are elements of the integer ring Zm, and the components of the n-
tuples are digits in base-q representation. Apart from a different notion of distance
and weight of codewords (cf. Section 4.3), this has consequences for the propagation
of single bit errors in the representation. Despite similar representation an error in
a single component of an arithmetic codeword may affect other digits of the result
of an addition, while for a non-arithmetic code the error is constrained to the same
digit position.
We will now give a precise definition for arithmetic AN codes.
Definition 5.2.3. [Gor87] Let m ∈ Z+. An arithmetic AN code is a subgroup
C = {AN |0 ≤ N < M} of Zm, where m = AM , for some A,M ∈ Z+.
5.2. REVIEW OF RELEVANT CODING THEORY 35
According to this definition the arithmetic AN code C is the ideal in Zm generated
by A, and M is the number of codewords in C.
Definition 5.2.4. For m = qn − 1, the left cyclic shift of a q-ary n-tuple c =
(cn−1, cn−2, . . . , c0) in Zm is the n-tuple c′ = (cn−2, cn−3, . . . , c0, cn−1). As before,
we say a code is cyclic if the left cyclic shift of any codeword is again a codeword.
It is straightforward to see that this representation is equivalent to the product
qc mod m, thus establishing the correspondence
c ª 1←→ c 7→ qc mod m .
As usual, a right cyclic shift can be represented as a series of n− 1 left cyclic shifts.
One could also define a left cyclic shift of a codeword by i positions, thus obtaining
the general correspondence
c ª i←→ qic (mod qn − 1)
or
c ª i←→ (cn−i−1, cn−i−2, . . . , c0, cn−1, . . . , cn−i+1, cn−i) .
Theorem 5.2.1. We immediately observe that any arithmetic AN code is cyclic in
the sense of Definition 5.2.4.
Example 5.2.1. We can construct a cyclic arithmetic code based on the factorization
of m = 2n − 1. Let n = 6, then m = AM factors into A = 9 and M = 7. This AN
code consists of the 7 codewords from the set C = {0, 9, 18, 27, 36, 45, 54}, represented
as the binary 6-tuples
(0, 0, 0, 0, 0, 0) (0, 0, 1, 0, 0, 1) (0, 1, 0, 0, 1, 0)
(0, 1, 1, 0, 1, 1) (1, 0, 0, 1, 0, 0) (1, 0, 1, 1, 0, 1)
(1, 1, 0, 1, 1, 0)
and can be used to embed the prime field Z7 in Z63.
36 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
Remark. Due to the Mersenne-like form of the ring modulus there exist two equiv-
alent binary encodings of the zero codeword, the all-zero and the all-one bit vector.
Although the latter is strictly speaking not a codeword, it may arise as a result of
the addition of two codewords.
5.2.3 Cyclic Code-Based Homomorphic Embedding
We can now begin to describe a connection between cyclic codes (in both senses) and
ring homomorphisms. For example, the arithmetic AN code of Example 5.2.1 can be
viewed as the image of the homomorphism φ : Z7 → Z63 via a 7→ 9a. Analogously, the
dual of the Hamming code can be viewed as an embedding of GF(8) inside F2[x]/(x7−
1) via φ : (a2, a1, a0) 7→ (a2x2+ a1x+ a0)(x4+ x2+ x+1). In both above cases, while
the additive structure is preserved in the ring, field multiplication is not preserved
under the product operation of the target ring, due to the extra multiplicative factor
introduced by encoding with the generator, i.e.
φ(a · b) = g · a · b 6= g2 · a · b = φ(a) · φ(b) .
This can be corrected in several ways, e.g. with an alternative definition of the product
operation, as we shall see a bit later on.
Choosing cyclic code-based embedding for error detection has an extremely useful
side benefit: by embedding the finite field into a quotient ring, arithmetic operations
are no longer subject to modular reduction by the field defining irreducible polynomial
or modulus. Any result of an arithmetic operation with a representation that exceeds
n digits, can be reduced very efficiently due to the congruences xn ≡ 1 and qn ≡ 1.
Low-weight ring and field moduli are well studied in applied cryptography: The
finite fields recommended by NIST for the elliptic curve digital signature algorithm
(EC-DSA [oSN00]) are defined by low-weight primes and irreducible trinomials or
pentanomials. In [O¨SS04], modulus scaling was used to obtain low-weight ring moduli
5.3. CRYPTOGRAPHICALLY SIGNIFICANT FINITE FIELDS 37
for efficient scaled modular arithmetic. The authors, however, did not make use of
the redundancy for error detection, which is the objective in this dissertation.
5.3 Cryptographically Significant Finite Fields
It is our goal to embed a finite field GF(q) into a larger ring, for q some prime power.
In the case where the target is to be a binary cyclic code, i.e. q = 2k, this means that
xn − 1 needs as a factor some degree k irreducible polynomial over GF(2). Similarly,
in the case of arithmetic codes, we require 2n− 1 to contain a large prime factor p of
bitlength |p| ≈ k bits.
At this point it seems as if all we need to do is to search for some suitable parame-
ters n, k, by way of factorization of the ring modulus. Unfortunately, with respect to
specific applications such as Elliptic Curve Cryptography (ECC) not all finite fields
are suitable choices for defining secure elliptic curve groups. Binary fields GF(2k)
with k composite are believed to be susceptible to a class of attacks based on Weil
descent [GHS02]. Without going too much into the details, Weil descent allows one to
map the elliptic curve discrete logarithm problem (EC-DLP), upon which the secu-
rity of elliptic curve cryptosystems is based, to a related problem on a hyper-elliptic
curve of larger genus, but much smaller subfield. This gives rise to index-calculus
attacks on the EC-DLP for which the time complexity depends largely on the field
size. In short, composite degree field extensions may reduce the security of elliptic
curve cryptosystems.
As a result we want to find cyclic [n, k]2 codes where x
n − 1 (mod 2) factors into
a large irreducible polynomial of prime degree k and some other smaller polynomials.
For fields with an extension degree in the range that is of interest for elliptic curve
cryptography, i.e. 130 ≤ k ≤ 500, there are exactly ten suitable choices, and each
carries a large amount of redundancy, in fact, they all have n = 2k + 1 as shown
38 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
in Table 5.1, which means that codewords are more than twice as long as the field
elements which we would like to embed. The situation is only slightly better with
cyclic arithmetic codes which are defined in a similar way. In this case the field
modulus and the amount of redundancy is determined by the integer factorization of
2n ± 1.
The advantage of using cyclic codes for embedding is that we can make certain
statements about the worst-case minimum distance (designed distance) of the code,
and therefore about its error detection capability, based on the BCH bound [van92]:
Theorem (BCH bound). Let C be a q-ary [n, k] cyclic code with generator poly-
nomial g(x). Let m be the multiplicative order of q modulo n (GF(qm) is thus the
smallest extension field of GF(q) that contains a primitive n-th root of unity). Let α
be such a primitive n-th root of unity in GF(qm). Select g(x) to be a minimal-degree
polynomial in GF(q)[x] such that g(αb) = g(αb+1) = . . . = g(αb+δ−2) = 0 for some
integers b ≥ 0 and δ ≥ 1, so g(x) has (δ − 1) consecutive powers of α as zeros. It
follows that the code defined by g(x) has minimum distance at least δ.
The designed distance δ given by this theorem (and listed in Table 5.1 for the code
parameters mentioned earlier) is not necessarily a tight bound. The true minimum
distance is often larger, so this theorem only gives a lower bound.
n 263 359 383 479 503 719 839 863 887 983
k 131 179 191 239 251 359 419 431 443 491
δ 8 9 9 13 9 11 11 9 9 11
Table 5.1: Cyclic Codes with Prime Degree Irreducible Divisors
5.4. A GENERALIZATION OF CYCLIC CODES 39
5.4 A Generalization of Cyclic Codes
In order to mitigate this scarcity of useful code parameters, we move from purely
cyclic codes to a new and more general family of codes with a slightly different
definition. This departure from cyclic codes was inspired by the techniques for scaled
modular arithmetic described in the reference mentioned earlier [O¨SS04]. By relaxing
the notion that the code has to be strictly cyclic, we can find rings in which we can
embed fields of nearly arbitrary cardinality, with flexible trade-offs between field size
and amount of redundancy. We thus obtain a generalized interpretation of codes
with homomorphic structure, in which arithmetic and cyclic codes constitute special
cases. The ring modulus can now take any form, although for performance reasons
we prefer a “Mersenne-like” form qn ± u (xn − u(x) in the polynomial case). Since
each arithmetic operation in the ring is followed by a modular reduction step, it
is desireable to keep the complexity of modular reduction low. Therefore, the best
choices for u (and likewise u(x)) are typically those that are small (resp., of small
degree) and of low weight. 2
For lack of better terminology we shall name this class of codes ‘accumocyclic’.
This name was chosen for two reasons: (a) to indicate the close relationship to cyclic
codes and (b) to distinguish its behavior with regard to shifted codewords. Upon
a single shift of to the left, the ring defining polynomial mandates that the most
significant coefficient does not just simply wrap around back to the least significant
position, but that it is multiplied with u(x) and added to the remaining shifted
coefficients, i.e. cn−1u(x) is accumulated in the lower part of the codeword. We define
this concept formally as an accumocyclic shift with respect to u(x).
Definition 5.4.1. Let u = (un−1, un−2, . . . , u0) denote the q-ary n-tuple associated
2Since implementation of q-ary arithmetic in digital circuits is more complex and less common
for bases q 6= 2, we shall focus on codes with a binary representation throughout the remainder of
this chapter.
40 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
with the polynomial u(x) over GF(q). The accumocyclic shift c′ = c
+
ªu 1 of the
codeword c by one position to the left with respect to u denotes the q-ary n-tuple
resulting from the linear combination of (cn−2, cn−3, . . . , c0, 0) and the scalar multiple
cn−1u over GF(q), i.e. c′ = (cn−2+ cn−1un−1, cn−3+ cn−1un−2, . . . , c0+ cn−1u1, cn−1u0).
Derive an associated polynomial c′(x) = (cn−2+cn−1un−1)xn−1+(cn−3+cn−1un−2)xn−2+
. . . + (c0 + cn−1u1)x + cn−1u0. It is easy to see that this polynomial is equivalent to
the product xc(x) mod xn − u(x), thus establishing the correspondence
c
+
ªu 1←→ c(x) 7→ xc(x) mod xn − u(x) .
Accumocyclic shifts by i positions and those to the right can be defined in a similar
manner. We can now define accumocyclic codes as codes invariant under accumocyclic
shifts according to Definition 5.4.1:
Definition 5.4.2. Let C a linear block code of length n over GF(q), viewed as a
vector subspace of GF(q)[x]. Further, let u(x) denote a polynomial of some degree
m < n, specific to the code. We say the code C is accumocyclic with respect to u if
and only if any accumocyclic shift of a codeword c ∈ C, according to Definition 5.4.1
again yields a codeword c′ ∈ C.
Observe that, due to linearity, any addition of two codewords a and b modulo q
yields another codeword. Even though this new class of codes is not cyclic according
to Definition 5.2.1, its codes possess certain useful properties which are of interest to
us, mainly the possibility to embed cryptographically significant fields and efficient
modular arithmetic. We can obtain a broad range of suitable field and scaling factor
parameters via factorization of the ring moduli. For illustration we have included a
selection of parameters in Tables A.1, A.2 and A.3 in the Appendix.
Example 5.4.1. Let n = 18, k = 13, q = 2, and u(x) = x+ 1. One can easily verify
that g(x) = x5 + x2 + 1 is a polynomial such that g(x)|xn + u(x). The accumocyclic
5.5. HOMOMORPHIC EMBEDDING IN RINGS 41
code C = {g(x)a(x) : a(x) ∈ GF(2)[x]} generated by g(x) thus forms an ideal in the
quotient ring GF(2)[x]/(x18 + x + 1). It is the set of all codewords c(x) = g(x)a(x)
for data polynomials a(x) of degree less than k.
We want to illustrate the difference to classical cyclic codes. Consider the code-
word c(x) = x17 + x14 + x9 + x5 + x4 + 1. In contrast to a cyclic code, a cyclically
shifted codeword c′(x) = a(x) ª 1 = x15+x10+x6+x5+x+1 is not in C, since it is
not a multiple of g(x). Yet, an accumocyclic shift of a codeword with respect to u(x)
c′′(x) = c(x)
+
ªx+1 1
= xa(x) (mod x18 + x+ 1)
= x18 + x15 + x10 + x6 + x5 + x (mod x18 + x+ 1)
= x15 + x10 + x6 + x5 + 1
produces a new codeword, since c′′(x) = x15 + x10 + x6 + x5 + 1 = g(x)(x10 + x7 +
x4 + x2 + 1).
5.5 Homomorphic Embedding in Rings
Consider the case when a field F is embedded in a larger ring R. Embedding works
by mapping any element a ∈ F to an element φ(a) ∈ R via a suitable embedding
function, i.e. φ is an injective ring homomorphism. Therefore we can carry out all
arithmetic operations originally defined for F in R instead. At the end of the com-
putation it will be required to use the inverse mapping φ−1 : R→ F to transform the
result back from the ring to the field. In the following we will define explicit map-
ping functions to make our approach more transparent. We investigate two different
methods for embedding: Basic and Idempotent scaled embedding. Both methods tie
into the theory of cyclic codes for certain parameter selections, but can also be seen
as a generalization of the concept without making claims about any sort of minimum
42 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
distance metric. We already indicated this link to coding theory in Section 5.2.3. We
would like to point out that both methods apply equally well to integer and poly-
nomial rings. For sake of simplicity, however, we will refrain from making explicit
distinctions in notation unless such a distinction is required.
The idea of scaled embedding is closely related to the modulus scaling techniques
set forth in [O¨SS04]. There a scaling factor s was applied to the field modulus p to
obtain a scaled modulus m = s · p of Mersenne-like form that allows efficient shift-
and-subtract modular reduction. The redundancy introduced by scaling the modulus
allows us to implement an error detection scheme. A na¨ıve direct mapping φ(a) = a,
however, does not provide error detection capabilities, since mapped elements do not
form an ideal over the ring and errors would be indistinguishable from data. Scaled
embedding, on the other hand, multiplies operands by a generator value g which may
actually be distinct from the modulus scaling factor s. It effectively partitions the
ring R into cosets, of which only one contains valid codewords. Error detection can
therefore be based upon checking for membership in the correct coset. With respect
to the choice of a suitable modulus scaling factor we strive to achieve two goals:
1. to select the target ring large enough to have sufficient redundancy for error
detection purposes, i.e. an amount proportional to the length of the scaling
factor, i.e. log2 s;
2. to obtain a ring modulus m = p · s for which an efficient reduction technique
exists. This helps to offset some of the overhead in complexity that we incur by
adding redundancy.
If the field F and its associated modulus (prime integer or irreducible polynomial)
p are determined by the application, then the choices for a suitable scaling factor
might be limited. If, however, the value of the modulus is not fixed, then one can
choose a suitable pair (s, p) based on the required levels of security (modulus size)
5.5. HOMOMORPHIC EMBEDDING IN RINGS 43
and redundancy. The computationally most efficient moduli of Mersenne-like form
have very small Hamming-weight, e.g. less than 5. Therefore, in order to find such a
pair we let m = 2n ± u, with u small and n = dlog2 p+ log2 se, be the preferred ring
modulus and find a suitable field by way of factorization. Depending on the size of n
factorization might take a long time, especially in the case of finding suitable prime
fields.
For polynomial moduli the ideal form is a binomial m(x) = xn ± 1, as in cyclic
codes, but other moduli xn±u(x) with small degree u(x) are also conceivable. When
u(x) = 1 then the reduction of partial products, e.g. during the shifting step of bit-
serial multiplication, becomes trivial since the shift with reduction can be simplified
to a bit rotation due to the equivalence xn ≡ ∓1 mod p(x). The factorization of
binomials is well studied, and there exist many efficient methods. What we are looking
for specifically are large irreducible factors of prime degree. This is important mainly
for applications in elliptic curve cryptography, as previously discussed.
5.5.1 Basic Scaled Embedding
Once we have found suitable parameters (p, s) for the modulus and its scaling factor,
we can encode the input operands by means of multiplication with the generator value
g. For basic scaled embedding this is the same value as the modulus scaling factor s,
i.e. g = s. The function φs(a) = g · a maps an element a from F to R. It provides
error detection capabilities since all valid elements of R must be proper multiples
of g. Note that while the mapping preserves addition, it does not preserve regular
multiplication, i.e.
φs(a · b) = g · a · b 6= φs(a) · φs(b) .
With an alternative definition3 of the product operation of the ring, however, that
implicitly absorbs the extra scaling factor, φs(·) becomes a ring homomorphism with
3We use the symbol ? to prevent confusion with regular multiplication.
44 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
respect to (+, ·) and (+, ?) operations, i.e. φs(a · b) = φs(a) ? φs(b).
Definition 5.5.1 (g-absorbing multiplication). LetR be a ring under + and ·, g ∈ R.
Further, let a, b ∈ F and A = φs(a), B = φs(b) ∈ R. Define ? : R × R → R via
A ? B = AB/g, when this belongs to R, otherwise A ? B will be left undefined.
Observe that ? is associative
(A ? B) ? C = A ? (B ? C)
and distributive
A ? (B + C) = A ? B + A ? C
when defined.
Therefore, multiplications in R are implemented using the ? operation instead of
regular ring multiplication, while addition in the ring remains the same. Since the
value for g is constant for a specific modulus, division may be implemented more
efficiently than in the general case. Algorithms for division by constants have been
treated, for example, in [Par00].
Error Detection
As mentioned earlier, detection of errors can be based on computing the remainder
of a division by g. A value of zero indicates that the operand is likely to be free
of errors. We have to use caution here, because quite naturally our scheme cannot
detect error patterns that are proper multiples of g. We apply a relatively simple
error model to determine the error coverage of this method: We assume that errors
only occur as additive terms on input operands and that the operation itself is fault-
free. Consequently, the output is the sum of the correct result and another additive
error term related to the input errors and the operation. While such an error model
may be rather simple, we would like to make a point for its validity in the context of
5.5. HOMOMORPHIC EMBEDDING IN RINGS 45
fault-insertion attacks. From an adversarial point of view, the most accessible targets
with high probability of success for introducing an error are storage elements like
registers and SRAM memory cells. A glitch attack on such a bi-stable device, e.g.
using optical fault induction with a focussed laser beam [SA02], is able to cause an
error regardless of the exact point in time during which it is carried out (with respect
to the clock interval). A glitch in a combinational part of a circuit will manifest itself
as an error only if it reaches the next register in time for the next clock edge and if
it does not significantly violate setup and hold time requirements.
We will now determine the conditions under which we can detect errors. Let
A = φs(a), B = φs(b) ∈ R denote the fault-free input operands of the multiplication
operation A ? B and C the fault-free result.
C = A ?B = (A ·B)/g (mod m) = ((g · a · g · b)/g) (mod m) = g · a · b (mod m)
Furthermore let C ′ denote the result in the presence of additive error terms eA, eB ∈ R
on the inputs:
C ′ = (A+ eA) ? (B + eB)
= (g2 · a · b+ g(a · eB + b · eA) + eA · eB)/g (mod m)
= C + a · eB + b · eA + eA · eB
g
(mod m)
= C + eC
Here eC ∈ R is the resulting error on the output. Certain errors can be detected
immediately during the division step of the ? multiplication procedure, e.g. when the
remainder C ′ mod g 6= 0, which means that g does not divide eA · eB. There are two
other non-trivial cases of potentially undetectable errors:
1. Certain one-sided errors, e.g. eA = 0, eB 6= 0, such that eC = a · eB mod m,
which are not detectable if g|eC .
46 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
2. Some two-sided errors eA, eB 6= 0. Now we have eC = a·eB+b·eA+ eA·eBg mod m.
The error is undetectable iff g|eC .
The procedure for error detection is based on modular reduction of the operands with
respect to the scaling factor g and checking for a non-zero remainder. Here it can be
performed outside of the critical path of the computation. As long as there is no error
in any of the previous operations, the result can immediately be used as the input
for subsequent operations, while an error check is performed in parallel. The major
problem we face with basic scaled embedding is the division step that is intrinsic to
the ? multiplication, since it adds to the critical path. Division is notoriously complex
in hard- and software implementations unless the divisor is a constant of special form,
which is not usually the case. In the next section we present a modification to the basic
scaled embedding idea, which completely avoids the division step of ? multiplication.
5.5.2 Idempotent Scaled Embedding
The division step of ? multiplication in the basic scaled embedding scheme is required
because both operands contain a multiplicative factor g which results in a square
factor g2 for the product. One way to avoid the extra g is to perform multiplication
with only one scaled input, the other unscaled, but this introduces a host of other
problems. First we would lose error detection capabilities in the unscaled operand,
and secondly the product of two results from previous multiplications would again
require division.
One elegant solution is to find a scaling factor that is idempotent with respect
to the scaled modulus, i.e. g ≡ g2 mod m. A class of non-separate arithmetic codes
known as AN codes use the same encoding principle as scaled embedding and suffer
from the same problem of an extra residue of the generator value. In [Pro89] Proudler
introduced a class of idempotent AN codes that preserve addition and multiplication
in the ring. These codes can therefore be used to form a ring homomorphism φi(·)
5.5. HOMOMORPHIC EMBEDDING IN RINGS 47
that avoids division altogether. A critical flaw of idempotent AN codes is, however,
that a one-sided error, i.e. one that only appears in one of two input operands, will be
masked in a multiplication with the other error-free operand due to the distributive
law:
A′ = φi(a) + eA
B = φi(b)
A′ ·B = (g · a+ eA)(g · b) (mod m)
= g2 · a · b+ g · b · eA (mod m)
= φi((a+ eA)b)
This flaw can be compensated for by extending from idempotent AN to idempotent
AN+B codes. These were also introduced in [Pro89] and like AN codes derive their
names from the encoding procedure. In addition to being scaled by the generator
value g, a constant term c is added to the operands during encoding. AN+B codes
exist whenever the ring modulus m = p · s and gcd(p, s) = 1. Then we can construct
the values g and c as orthogonal idempotents with respect to the modulus m as
follows:
g = (s−1 mod p) s (5.1)
c = (p−1 mod s) p (5.2)
where
g2 ≡ g (mod m) , (5.3)
c2 ≡ c (mod m) and (5.4)
g · c ≡ 0 (mod m) . (5.5)
Unlike AN codes, AN+B codes are no longer addition preserving. In the presence of a
heterogeneous mix of addition and multiplication operations it is therefore necessary
48 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
to convert4 operands back and forth between codes. Luckily this is a rather trivial
exercise since both codes share the same generator g. We can re-define multiplication
to implicitly handle the conversion steps by adding the constant term c to each
operand before multiplication and subsequently subtracting it from the result. The
difference to ordinary multiplication in the ring is indicated through the use of the ?
symbol:
Definition 5.5.2. Let A = φi(a) and B = φi(b) denote operands embedded in
R, where φi(x) = g · x mod m. Then addition in the ring is defined as usual and
multiplication is re-defined as
A ? B = (g · a+ c) · (g · b+ c)− c (mod m)
= g2 · a · b+ c · g(a+ b) + c2 − c (mod m)
= φi(a · b) = g(a · b) (mod m)
due to the equivalences defined in (5.3), (5.4) and (5.5).
We can thus define an idempotent ring homomorphism with respect to (+, ?) and
(+, ·) operations as
φi(0) = 0,
φi(a) + φi(b) = φi(a+ b) and
φi(a) ? φi(b) = φi(a · b) .
Now a one-sided error eA will not be masked anymore, provided that s - eA:
A′ ? B = (g · a+ c+ eA)(g · b+ c)− c (mod m)
= g(a · b) + eA(g · b+ c) (mod m)
≡ eA mod s
4Note that conversion is only necessary at the boundary between heterogeneous operations like
addition and multiplication. It can be omitted for homogeneous operations like modular exponen-
tiation, which are based exclusively upon multiplication.
5.5. HOMOMORPHIC EMBEDDING IN RINGS 49
Once all computations have been performed the non-redundant result needs to be
converted back from the ring to the field via the inverse homomorphism φ−1i (·). This
is achieved through modular reduction of the result with respect to the field modulus
p.
We now have an efficient method for embedding a field into a larger ring with
meaningful redundancy that we want to use for error detection purposes. Since valid
codewords need to be proper multiples of the generator value g ≡ 0 mod s, an error
check can be performed by computing the remainder of a division by s. An additive
error eR on the result will be detected as e
′
R = eR mod s, if it is not evenly divisible
by s. Hence, if e′R = 0, the result can be assumed free of errors with high probability.
There may be cases, however, in which an error eA remains undetectable. In the
following we establish the probability of this happening.
Error Detection
We apply the same error model as before, which assumes that errors only occur at
the input operands. In the presence of additive error terms we can model system
behavior for addition as
A′ = A+ eA , B′ = B + eB
A′ +B′ = g(a · b) + (eA + eB) mod m
eR+ = eA + eB (5.6)
and for multiplication as
A′ ? B′ = (g · a+ eA + c) · (g · b+ eB + c)− c mod m
= g(a · b) + eA(g · b+ c) + eB(g · a+ c) + eA · eB mod m
eR? = eA(g · b+ c) + eB(g · a+ c) + eA · eB mod m . (5.7)
From the reduction modulo s we obtain the detectable portion of the error term. In
the case of addition (5.6) this is e′R+ = eA+eB mod s. A faulty result is undetectable if
50 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
eA ≡ −eB mod s. For simplicity we assume that the errors eA and eB are independent
and identically distributed random variables from uniform. Thus the probability of
an undetectable error is 1/s.
An error occurring during multiplication will produce the term e′R? = eA + eB +
eA · eB mod s which we obtained through application of the equivalences g ≡ 0 mod s
and c ≡ 1 mod s to (5.7). We can find the probability of an undetectable error during
multiplication using the following lemma:
Lemma. Let X,Y be two independent and identically distributed random variables
uniform over [0, s−1] and let the event A = {(X = x, Y = y) : x+y+x·y ≡ 0 mod s}.
Then the probability of A occurring is Pr[A] = Φ(s)/s2, where Φ() denotes the Euler
totient function.
Proof. We can rewrite the event A as follows: A = {(X = x, Y = y) : y = f(x)},
where f(x) = −x · (x + 1)−1 mod s. The function f(x) will only be defined if the
inverse of x+ 1 exists. For any given modulus s this is the case only for Φ(s) choices
in the range 0 ≤ x < s. Hence, f(x) is defined and has a value with probability
pR = Φ(s)/s. Y takes on a specific value y with probability pL = 1/s. The joint
probability of the event A occurring is therefore p = pL · pR = Φ(s)/s2, due to the
independence of X (and hence f(X)) and Y .
Here the event A stands for the occurrence of an undetectable error at the output
of the multiplier. It is easy to see that the best error coverage can be obtained when
the modulus scaling factor s is composite and large. We would like to re-iterate that
the error detection mechanism requires a full modular reduction by s, which in the
general case does not have a suitable special form as the scaled modulus m. While
this might be viewed as a drawback, it should be noted that checking for errors can
be done outside of the critical path (in hardware) or at regular intervals (software
realization), while the main computation continues operation. As a matter of fact,
5.5. HOMOMORPHIC EMBEDDING IN RINGS 51
the regular field modulus p does not in general have a suitable special low Hamming-
weight form either, such that the overhead due to error detection is easily offset by
the efficient reduction modulo m = s · p.
5.5.3 Error Correction Using Algorithm-Based Fault Toler-
ance
Quite naturally one would like to build an arithmetic architecture with the ability
to also correct errors that occur during computation. For cyclic codes, syndrome
decoding allows the correction of the most likely error pattern (assuming blind fault
induction). It does not, however, give good results in the presence of burst errors,
which would likely occur if an adversary tried to attack the computation. The reason
is that in an adversarial setting one has to make worst case assumptions and cannot
expect that decoding to the nearest codeword will successfully correct the error.
A different approach is the use of algorithm-based fault tolerance. The princi-
pal idea here is to keep valid input operands around until after the computation
has finished and an error check determines a valid result. If the error check fails,
the computation can be repeated (replayed) until a valid result is available. Alterna-
tively, if the computation fails repeatedly, an alarm can be signaled and the operation
canceled. The advantage of this method is clearly its robustness in the presence of
transient burst errors. It does not matter which of all possible errors covered under
a specific syndrome triggered the detection, when the computation can simply be
repeated. Another advantage is the relatively low overhead that is required, which is
mostly caused by the storage elements necessary to keep backup copies of operands.
A potential disadvantage is that the method does not degrade gracefully, meaning
that permanent faults due to stuck-at-0/1 errors can not be compensated for. Circuit
defects thus render this method completely useless and do not help, for example, to
increase the yield of circuit production.
52 CHAPTER 5. CYCLIC CODES AND HOMOMORPHIC EMBEDDING
5.6 Analysis and Discussion
We have presented a novel scheme for fault-tolerant finite field computation with
applications in public-key cryptography. Homomorphic scaled embedding is practical
and allows designers of cryptographic systems to add fault tolerance with moderate
resource overhead. It provides adequate protection against transient faults of either
random or adversarial nature under the assumption of reasonable limitations to the
attacker’s capabilities. This kind of protection is particularly important, due to the
continuing success of fault-insertion attacks on cryptographic embedded systems.
Chapter 6
Robust Codes: Tamper Resistance
Under a Stronger Error Model
In chapter 4 we introduced a general error model for cryptographic devices in adver-
sarial situations. We left open the question of adversarial capability, i.e. the capacity
of an attacker to successfully introduce faults into the system. In general this is a
difficult property to assess, as much of it depends on the adversarial’s motivation and
resources, as well as the system’s physical countermeasures.
In the previous chapter we implicitly assumed that the necessary temporal and
spatial precision to cause errors of a particular bit-pattern were unavailable to the
adversary. If they were, the proposed methods for arithmetic with encoded operands
would fail, due to the linearity of the scheme; any linear combination of two codewords
will be a codeword again, and therefore the attacker may simply choose to cause an
error with the same pattern as a valid codeword. It would be relatively easy to
generate a valid codeword, since (paraphrasing Kerckhoff’s famous principle [Ker83])
one should always operate under the assumption that basic system parameters are
known to the adversary, with the notable exception of key material.
Although linear codes may seem sufficient for the protection of public-key arith-
53
54 CHAPTER 6. ROBUST CODES
metic primitives today, advances in fault insertion methodologies may suddenly be-
come available which provide the necessary precision for feasible attacks. In this
chapter we address a scenario in which an attacker may introduce arbitrary error
vectors reliably, with nearly arbitrary precision.
In light of such worst-case assumptions, a linear code based error detection scheme
is bound to fail. We therefore turn our attention to non-linear schemes and develop
a family of robust arithmetic codes with a very high error detection probability. It is
capable of detecting errors of arbitrary weight, and the small portion of undetected
errors is highly data dependent, which makes it nearly impossible to find successful
error patterns without a priori knowledge of the data.
6.1 Introduction to Robust Codes
A family of systematic non-linear error detecting codes, termed ‘robust codes’, was
derived from systematic linear block codes in [KT04]. Their use in symmetric ciphers
like the AES has been proposed in [KKT04] and later refined in [KKT05]. The
robustness of these codes is due to the much more uniform error detection capabilities
these codes offer. Unlike linear codes, where the detection capability depends only on
the weight of the error vector, for robust codes the probability Q(e) of an undetected
error e depends on the combination of error and data patterns.
Thus, in the case of a cryptographic key which is not known to the attacker a
priori, a fault-injection attack is much more difficult to carry out than with a linear
encoding scheme.
Robust codes can achieve optimality according to the minimax criterion, that is,
minimizing over all (n, k) codes the maxima of the fraction of undetectable errors
Q(e) for e 6= 0. The following definition from [KT04] rigorously defines a particular
class of non-binary codes.
6.2. ROBUST ARITHMETIC RESIDUE CODES 55
Definition 6.1.1. Let V be a linear p-ary (n, k) code (p ≥ 3 is a prime) with
n ≤ 2k and check matrix H = [P |I] with rank(P ) = n − k. Then CV = {(x,w)|x ∈
GF (qk), w = (Px)2 ∈ GF (qr)}
For binary robust codes (q = 2) the definition is slightly different; instead of
squaring the check-symbol w is obtained through cubing. We will give a more detailed
introduction to robust non-linear block codes in Chapter 7.
While robust codes work well with symmetric ciphers that employ only little
more than table look-ups, XORs and byte-wise rotations, they are virtually unusable
within the finite field arithmetic structure that forms the basis of most public-key
algorithms.1
In the following sections we will introduce a construction for robust arithmetic
residue codes, inspired by the construction in [KT04]. Based on these codes we present
a scheme for robust multi-precision arithmetic over the positive integers. Our scheme
lends itself well for straightforward implementation of standard modular multiplica-
tion techniques, i.e. Montgomery or Barrett Multiplication, secure against active fault
injection attacks.
6.2 Robust Arithmetic Residue Codes
As mentioned before, a class of non-linear systematic error detecting codes, so-called
“robust codes”, were proposed by Karpovsky and Taubin [KT04]. They achieve
optimality according to the minimax criterion, that is, they minimize over all (n, k)
codes the maxima of the fraction of undetectable errors Q(e) for e 6= 0. While
they are suitable for data transmission in channels with unknown characteristics, and
also for robust implementation of symmetric-key cryptosystems with little arithmetic
structure, they do not preserve arithmetic. We thus propose a new type of non-
1Optimal extension fields are an exception, see Chapter 7
56 CHAPTER 6. ROBUST CODES
linear arithmetic code, based on the concept of arithmetic residue codes. We define
robustness as follows:
Definition 6.2.1. Let C = {(x,w)|x ∈ Z2k , w = f(x) ∈ Fp} be an arithmetic single-
residue code with a function f : Z2k 7→ Fp to compute the check symbol w with
respect to the prime check modulus p of length r = dlog2 pe bits. A non-zero error
e ∈ {(ex, ew)|ex ∈ Z2k , ew ∈ Z2r} is masked for a message x, when (x+ex, w+ew) ∈ C,
i.e. iff
f
(
(x+ ex mod 2
k)
)
= f(x) + ew mod 2
r . (6.1)
The error masking probability for a given non-zero error is thus
Q(e) =
|{x|(x+ ex, w + ew) ∈ C}|
|C| . (6.2)
We call the code C robust, if it minimizes maxima of Q(e) over all non-zero errors.
Total robustness is achieved for maxe 6=0(Q(e)) = 2−r. We also call C ²-robust if it
achieves an upper bound maxe 6=0(Q(e)) ≤ ² · 2−r, where ² is a constant much smaller
than 2r.
In the following we propose a class of non-linear single-residue arithmetic codes
Cp based on a quadratic residue check symbol, which achieves ²-robustness. Since in
practice total robustness is hard to achieve, we will from now on refer to ²-robustness
simply as robustness.
Theorem 6.2.1 (Robust Quadratic Codes). Let Cp according to Definition 6.2.1,
with f(x) := x2 mod p. Cp is robust iff r = k and 2
k − p < ², and has the error
masking equation
(x+ ex mod 2
k)2 mod p = w + ew mod 2
k (6.3)
Proof. To prove robustness we proceed by proving an upper bound ² on the number
of solutions of the error masking equation (6.3), as that directly translates into a
6.2. ROBUST ARITHMETIC RESIDUE CODES 57
bound on Q(e). The modulo 2k operator from the LHS of (6.3) stems from the
limitation of the data path to k-bits. This limits the ranges of both the message and
the message error to 0 ≤ x, ex < 2k. We can therefore remove the modulo 2k operator
by distinguishing between the two cases x + ex < 2
k and x + ex ≥ 2k. Similarly, an
error is masked only if the faulty check symbol w < p, so for k = r we can distinguish
between the three cases w+ ew < p, p ≤ w+ ew < 2k and 2k ≤ w+ ew < 2k+ p. This
allows us to simplify the RHS of (6.3).
1. Solutions x < 2k − ex: An error (ex, ew) is masked iff
(x+ ex)
2 mod p = w + ew mod 2
k
Simplifying the RHS we have the following three cases:
(a) If w < p− ew, the error is masked iff
(x+ ex)
2 mod p = w + ew (6.4)
If e = (p, 0) eq. (6.4) has exactly 2k−p solutions. For ex 6= p and ex ≥ 2k−p
there exists at most a single solution; at most two solutions exist in the
case of ex < 2
k − p.
(b) If p − ew ≤ w < 2k, the error will never be masked, since a check symbol
w ≥ p will always be detected.
(c) For w ≥ 2k − ew the error will be masked iff
(x+ ex)
2 mod p = w + ew − 2k . (6.5)
Eq. (6.5) has at most two solutions.
2. Solutions x ≥ 2k − ex: An error (ex, ew) is masked iff
(x+ ex − 2k)2 mod p = w + ew mod 2k
For the RHS we distinguish the following three cases:
58 CHAPTER 6. ROBUST CODES
(a) If w < p− ew, the error is masked iff
(x+ ex − 2k)2 mod p = w + ew (6.6)
Eq. (6.6) has at most two solutions, unless we have an error e = (2k−p, 0),
in which case there are 2k − p solutions.
(b) If p − ew ≤ w < 2k, the error will never be masked, since a check symbol
w ≥ p will always be detected.
(c) For w ≥ 2k − ew the error will be masked iff
(x+ ex − 2k)2 mod p = w + ew − 2k . (6.7)
Eq. (6.7) has at most two solutions.
Q(e) is determined by the number of solutions to the error masking equation (6.3).
This gives us the following bound:
There are at most 2k− p+2 solutions to (6.3) for errors of the form (p, 0)
or (2k − p, 0), and at most 8 solutions for all other errors.
We would like to point out that the transition from linear arithmetic to robust
quadratic codes with the same parameters k, r and p results in a much more uniform
distribution of the error detecting capability of the code. For example, for linear
codes with k = r there are double errors with ex = ew such that Q(e) is very close
to 1, i.e. the errors cannot be detected. For robust quadratic codes with the same
parameters, Q(e) is close to zero for all e.
We now give an intuitive argument to show the existence of practical robust codes
for cryptographic purposes with the help of the prime number theorem. The idea here
is that for fault-tolerance in an adversarial situation, the probability of not detecting
6.3. ROBUST ARITHMETIC OPERATIONS 59
an error should be insignificantly small. As we saw from the proof, in the worst case
we have a probability of at most Q(e) = 8 · 2−k = 2−k+3 of not detecting an error
(assuming a uniform distribution of messages, and 2k − p < 8). Therefore, a Q(e)
that makes insertion of an error infeasible for an attacker, requires a sufficiently large
digit size k and a prime p close enough to 2k so that the difference does not increase
Q(e) too much. For example, for k = r = 32 the distance from 2k to the closest
k-bit prime from below is less than 8, i.e. p = 232 − 5, thus bounding Q(e) by 2−29.
According to the prime number theorem the number of primes smaller than or equal
to x is approximately x/ lnx, i.e. for our case 2k/(k ln 2). Intuitively it thus seems
reasonable to expect to find a prime within the interval [2k−(k ln 2), 2k). In Table 6.1
we give the distance of the primes closest to 2k from below for practical values of k.
Table 6.1: Closest Prime Number Distances from 2k for Practical Values of k
k 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
2k − p 1 5 1 3 9 3 15 3 39 5 39 57 3 35 1 5
k 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
2k − p 9 41 31 5 25 45 7 87 21 11 57 17 55 21 115 59
k 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
2k − p 81 27 129 47 111 33 55 5 13 27 55 93 1 57 25 59
6.3 Robust Arithmetic Operations
In the previous section we proved the robustness of quadratic codes for digits of size
k bits. We now wish to apply them in a generalized framework for multi-precision
arithmetic over the positive integers.
Due to the range limitation of the information bits to 0 ≤ x < 2k, we need to
handle any overflow resulting from arithmetic operations. This may be a carry bit
60 CHAPTER 6. ROBUST CODES
generated by the addition of two k-bit operands, or the 2k-bit result of a multipli-
cation. The new digits that are created in this manner will need their own check
symbols, which cannot be derived from the input operands’ check symbols alone.
Thus they need to be derived purely from the information bits of the new digits,
creating a potential loophole for the insertion of an error. This can be avoided by
re-computing the joint check symbol from the newly generated individual check sym-
bols and comparing it to the output of the predictor. This re-computation represents
an integrity check which allows us bridge discontinuities introduced by interleaving
mixed modulus operations, here the check modulus p and the implicit range limiting
modulus 2k. Once the integrity check is in place we can perform standard arithmetic
operations, and implementing an algorithm like Montgomery’s for modular arithmetic
becomes straightforward.
In the following we show how this check may be implemented for various arithmetic
primitives. Let (a, |a2|p) and (b, |b2|p) denote encoded input operands a and b, where
|x2|p is short-hand notation for x2 mod p. We also introduce mnemonics for these
primitives, in order to tie them into a robust variant of the digit serial Montgomery
multiplication algorithm in the next section.
Addition (RADD and RADDC): RADD (Robust ADDition) and RADDC (Ro-
bust ADDition with Carry) compute the sum of the two input operands. This is
depicted in Figure 6.1. For reference, the operators ⊕p and ⊗p stand for addition
and multiplication modulo p, respectively. The sum c = a + b (+cin) may be larger
than 2k by at most a single bit. Let ch denote this new carry, and cl the k-bit
sum. The predictor computes the joint check symbol |c2|p as the sum of the check
symbols and additional terms involving the operands: |c2|p = |(a + b + cin)2|p =
||a2|p + |b2|p + 2(ab+ cin(a+ b)) + cin|p. For error detection we first create the check
symbol for the k-bit sum |c2l |p (the check symbol for the carry bit is the carry bit
6.3. ROBUST ARITHMETIC OPERATIONS 61
itself). Then we re-compute the joint check symbol as
|c2|∗p = |(ch2k + cl)2|p
=
∣∣ch · |22k|p + ch · |cl|p · |2k+1|p + |c2l |p∣∣p
=
∣∣ch · |22k + cl · 2k+1|p + |c2l |p∣∣p (6.8)
If the check |c2|∗p = |c2|p holds, then the result is deemed to be free from errors.
The resulting carry from both RADD and RADDC is held in a register local to the
addition circuit. If the following addition operation is RADDC, then that carry is
used for computation of the new sum. If it is RADD, then a zero carry is used.
Multiplication (RMUL): The product of a and b and its joint check symbol is
(c, |c2|p) = (a · b, ||a2|p · |b2|p|p). However, the previous tuple is not a code word, since
c may exceed 2k. We therefore split c into two halves ch and cl (cf. Figure 6.2), both
of which are within the desired range:
c = ch · 2k + cl 0 ≤ ch, cl < 2k .
We then compute the check symbols |c2h|p and |c2l |p separately, and establish their
integrity with the composite check symbol |c2|p:
|c2|∗p =
∣∣(ch · 2k + cl)2∣∣p
=
∣∣c2h · 22k + ch · cl · 2k+1 + c2l ∣∣p
=
∣∣|c2h|p · |22k|p + |ch|p · |cl|p · |2k+1|p + |c2l |p∣∣p (6.9)
Observe that the values |22k|p and |2k+1|p are constant for a given implementation
and that |ch|p and |cl|p are intermediate results from the computation of the separate
halves’ check symbols. Hence we have all the necessary ingredients to re-compute the
joint check symbol as |c2|∗p and compare it to the value obtained from the predictor.
If the comparison passes we assume there were no errors.
62 CHAPTER 6. ROBUST CODES
Error D
etection N
etw
ork
p
|b2|p|a2|pa b
c l
|x|p
lc |c l2 p| x==y?
p
carry
p
2x
p
|x2|p
k+1x+22k|p|2
2x
0
Error
0
RADDC?
0
0 p−1
Predictor
Figure 6.1: Robust Addition
Predictor
2|p
ch c l
|a2|pa
hc |ch2 p| lc |c l2 p|
|x|p
|x2|p
p
|x2|p
|x|p
p |2k+1  x|  p
|22kx|p p
p
b
x==y?
Error
Error D
etection N
etw
ork
|b
Figure 6.2: Robust Multiplication
6.4. ROBUST MONTGOMERY MULTIPLICATION 63
Shifts, Subtraction, Logic Operations: We can apply similar re-computation
techniques for other operations. Out of space considerations and since we do not need
these other operations for the next section, we skip their details at this point.
Error Detection: The comparison between the predictor output and the re-com-
puted joint check symbol is an easy target for an attack if carelessly implemented.
We therefore require implementation as a totally self-checking circuit [Pra86]. The
same holds for any other integrity checks.
6.4 Robust Montgomery Multiplication
We now show how to apply our robust code in a digit serial Montgomery Multiplica-
tion scheme. A good overview over several variants of the Montgomery algorithm is
given in [KAK96]. In this example we will refer to the finely integrated operand scan-
ning (FIOS) variant. It is the most suitable one for hardware implementations since
it can be used in a pipelined fashion offering some degree of parallelization [Gau02].
Algorithm 6.1 k-bit Digit-Serial FIOS Montgomery Multiplication
Require: d = {0, . . . , 0}, M ′0 = −M−10 mod 2k
1: for j = 0 to e− 1 do
2: (C, S)⇐ a0bj + d0
3: U ⇐ SM ′0 mod 2k
4: (C, S)⇐ (C, S) +M0U
5: for i = 1 to e− 1 do
6: (C, di−1)⇐ C + aibj +MiU + di
7: end for
8: (de, de−1)⇐ C
9: end for
In the following we require some basic familiarity on part of the reader with
64 CHAPTER 6. ROBUST CODES
the way of how Montgomery multiplication works. To review briefly: the objective
is to compute the modular product of two N -bit numbers with respect to the N -
bit modulus M . Montgomery’s algorithm requires the initial transformation of all
operands into residues of the form xˆ = xR modM and some final transformation back
x = xˆR−1 modM . Here R is the Montgomery radix, usually 2k·e, where e = dN/ke
represents the number of digits per operand. Without loss of generality we assume
that the transformation into the Montgomery residue system has already taken place
and we operate entirely within the residue system, so in order to simplify notation
we will refer to a residue xˆ simply as x.
The k-bit digit serial FIOS Montgomery algorithm takes as its inputs the e-digit
vectors a and b, and computes the product MM(a, b) = a·b·R−1 modM . The valueM ′0
is pre-computed whenever the modulus changes. In terms of notation, a pair (C, S)
represents the concatenation of two variables as the destination for the result of an
operation. Furthermore, the variable C is slightly larger than the other variables, i.e.
k+1 bits. This is so to efficiently handle extra carries from the accumulation of C, di
and the two products aibj and MiU . The division by R is handled by the algorithm
implicitly. For example, in line 4 the sum (C, S) +M0U is assigned to (C, S), but in
the following step S is dropped. This shift to the right by k bits, repeated e times,
results in division by R.
As one can easily verify, Algorithm 6.1 consists of only very basic addition and
multiplication steps. We may therefore obtain a robust digit-serial Montgomery al-
gorithm (Alg. 6.2) simply by mapping all arithmetic steps to our robust arithmetic
primitives introduced in the previous section. Additionally we insert intermediate
checks during which we verify the integrity of operand values and their check sym-
bols. This is indicated by a call to the pseudofunction Check((x, |x2|p), . . .). Although
not indicated in the algorithm description, we further assume that the error signal
generated by the the internal integrity check within the arithmetic primitives RADD,
6.5. ANALYSIS AND DISCUSSION 65
RADDC and RMUL, is also constantly evaluated. In the case of an error the algo-
rithm is aborted with an exception.
Some comments about the robust algorithm: Algorithm 6.1 appears much shorter
than Algorithm 6.2, since it combines multiple arithmetic operations into a single
step. Also, while it handles carries implicitly using a larger width variable C, the
robust algorithm is restricted to a digit size of exactly k bits. Thus, extra carry
handling steps are required. In Alg. 6.2, line 6, the destination of the top half of the
result is not assigned: (−,−). This is equivalent to computing the result modulo 2k,
as in Alg. 6.1, line 3. A similar thing happens in Alg. 6.2, line 8, where the lower half
of the result is dropped due to the implicit shift to the right. The point of performing
the addition is purely to determine whether or not a carry is generated.
6.5 Analysis and Discussion
In this chapter we have presented a novel systematic non-linear arithmetic code which
is robust against adversarial injection of faults and statistically occurring random
faults (soft-errors). Based on this code we have introduced arithmetic primitives for
robust computation over encoded digits. We have further used the example of digit
serial Montgomery modular multiplication to demonstrate how robust arithmetic can
be deployed for fault-secure multi-precision public-key computations.
Quite naturally the robustness of our scheme adds overhead, which has a neg-
ative impact on performance. This is a price we have to pay for the non-linearity
that enables robustness. On the other hand this scheme offers unprecedented and
quantifiable robustness against even the most motivated and capable adversaries.
In terms of critical path delay we estimate that multiplication incurs a performance
hit of a little less than 100%, compared to a linear scheme, due to the re-computation
of the quadratic check symbol. For addition, the absolute overhead is roughly the
66 CHAPTER 6. ROBUST CODES
same, however, in terms of relative overhead it fares much worse.
There are a couple of ways to reduce the complexity, based on the properties of the
check modulus p, which is constant for a given implementation. Future research will
quantify more precisely the performance and area overhead, and compare a variety
of system parameters, i.e. digit size k, check modulus p, degree of parallelism, etc.
For example, for certain values of k there exist Mersenne prime check moduli, which
enable very efficient implementations for check symbol computation.
Our scheme scales reasonably well, since once the digit size is determined, the
complexity of the predictor and error detection networks remain constant. We would
like to emphasize that we clearly prioritize robustness over performance. Given the in-
creased vulnerability level of mobile and ubiquitous security devices, and the progress
in adversarial fault analysis techniques, we believe that this is a sensible argument.
6.5. ANALYSIS AND DISCUSSION 67
Algorithm 6.2 Robust Montgomery Multiplication
Require: d = {(0, 0), . . . , (0, 0)}, M ′0 = −M−10 mod 2k
1: for j = 0 to e− 1 do
2: if Check((a0, |a20|p), (bj, |b2j |p), (d0, |d20|p), (M ′0, |(M ′0)2|p), (M0, |M20 |p)) then
3: ((T1, |T 21 |p), (T0, |T 20 |p))⇐ RMUL((a0, |a20|p), (bj, |b2j |p))
4: (T0, |T 20 |p)⇐ RADD((T0, |T 20 |p), (d0, |d20|p))
5: (T1, |T 21 |p)⇐ RADDC((T1, |T 21 |p), (0, 0))
6: ((−,−), (U, |U2|p))⇐ RMUL((T0, |T 20 |p), (M ′0, |M ′20 |p))
7: ((T3, |T 23 |p), (T2, |T 22 |p))⇐ RMUL((M0, |M20 |p), (U, |U2|p))
8: (−,−)⇐ RADD((T0, |T 20 |p), (T2, |T 22 |p))
9: (T0, |T 20 |p)⇐ RADDC((T1, |T 21 |p), (T3, |T 23 |p))
10: (T1, |T 21 |p)⇐ (carry, carry)
11: for i = 1 to e− 1 do
12: if Check((ai, |a2i |p), (bj, |b2j |p), (di, |d2i |p), (U, |U2|p), (Mi, |M2i |p)) then
13: (T0, |T 20 |p)⇐ RADD((T0, |T 20 |p), (di, |d2i |p))
14: (T1, |T 21 |p)⇐ RADDC((T1, |T 21 |p), (0, 0))
15: ((T4, |T 24 |p), (T3, |T 23 |p))⇐ RMUL((ai, |a2i |p), (bj, |b2j |p))
16: (T0, |T 20 |p)⇐ RADD((T0, |T 20 |p), (T3, |T 23 |p))
17: (T1, |T 21 |p)⇐ RADDC((T1, |T 21 |p), (T3, |T 23 |p))
18: (T2, |T 22 |p)⇐ (carry, carry)
19: ((T4, |T 24 |p), (T3, |T 23 |p))⇐ RMUL((Mi, |M2i |p), (U, |U2|p))
20: (di−1, |d2i−1|p)⇐ RADD((T0, |T 20 |p), (T3, |T 23 |p))
21: (T0, |T 20 |p)⇐ RADDC((T1, |T 21 |p), (T3, |T 23 |p))
22: (T1, |T 21 |p)⇐ (carry, carry)
23: else
24: ABORT
25: end if
26: end for
27: (de−1, |d2e−1|p)⇐ (T0, |T 20 |p)
28: (de, |d2e|p)⇐ (T1, |T 21 |p)
29: else
30: ABORT
31: end if
32: end for
68 CHAPTER 6. ROBUST CODES
Chapter 7
Low Cost Techniques for Robust
Finite Field Arithmetic
Two important aspects in the selection of an error detection scheme are the overhead
that is imposed by the scheme, and its capability of detecting errors (coverage). In an
adversarial setting it is desirable to achieve error coverage close to 100%, simply be-
cause the probabilities of different classes of errors to occur are unknown. Such a high
degree of coverage may come at a cost too high to be justifiable. Depending on the
threat level and underlying error model, however, it may be possible to accommodate
a slight decrease in coverage if it significantly reduces the cost of implementation. In
this chapter we show how a special class of systematic non-linear codes with strong
error detection properties [KT04] may be applied to elliptic curves defined over a
specialized class of finite fields, dubbed optimal extension fields (OEF). We define
robust versions of common components, i.e. adder, multiplier, shifter etc. with non-
linearly encoded check-symbols, which are required for the realization of elliptic curve
cryptosystems. Our design drastically reduces the overhead in both area and speed
while maintaining a high level of confidence for detecting errors induced by an active
adversary.
69
70 CHAPTER 7. LOW COST TECHNIQUES
7.1 Optimal Extension Fields and their Arithmetic
Optimal extension fields were introduced by Bailey and Paar in [BP98]. The main
idea is to use a generating polynomial of the form f(x) = xk − w to construct the
extension field GF (qk), where q is selected as a pseudo-Mersenne prime given in the
form 2m±c with log2 c < bm2 c. The pseudo-Mersenne form allows efficient reduction in
the ground field. The following Theorem [LN83] provides a simple means to identify
irreducible binomials that can be used in OEF construction:
Theorem 7.1.1. Let k ≥ 2 be an integer and w ∈ GF (q)∗. Then the binomial xk−w
is irreducible in GF (q)[x] if and only if the following three conditions are satisfied:
1. each prime factor of k divides the order e of w in GF (q)∗;
2. the prime factors of k do not divide q−1
e
;
3. q = 1 mod 4 if k = 0 mod 4.
The representation of OEF elements utilizes the standard basis. An element A ∈
GF (qk) is represented as
A =
k−1∑
i=0
aix
i = a0 + a1x+ a2x
2 + . . .+ ak−1xk−1
where ai ∈ GF (q). The OEF arithmetic operations are performed as follows.
Addition/Subtraction:
The addition/subtraction of two field elements A,B ∈ GF (qk) is performed in the
usual way, by adding/subtracting the polynomial coefficients in GF (q) as follows.
A±B =
k−1∑
i=0
aix
i ±
k−1∑
i=0
bix
i =
k−1∑
i=0
(ai ± bi)xi
Multiplication:
Let A,B ∈ GF (qk). Their product C = A ·B is computed in two steps:
7.2. ROBUST BLOCK CODES 71
1. Polynomial multiplication:
C ′ = A ·B =
2k−2∑
i=0
c′ix
i
2. Modular reduction:
C = C ′ (mod f(x))
=
2k−2∑
i=0
c′ix
i (mod xk − w)
=
k−2∑
i=0
(c′i + wc
′
i+k)x
i + c′k−1x
k−1
In the first step the ordinary product of two polynomials is computed. In the reduc-
tion step the binomial f(x) = xk − w facilitates efficient reduction. The reduction
may be realized by only k − 1 constant coefficient multiplications by w, and k − 1
additions.
Inversion:
Standard algorithms to compute the multiplicative inverse of field elements are based
either on the extended Euclidean algorithm, or Fermat’s little theorem. These meth-
ods, however, are usually computationally expensive. Alternatively, there exists an
elegant method for inversion in binary extension fields was introduced by Itoh and
Tsujii [IT88], and later generalized by Guajardo and Paar [GP02]. For certain classes
of fields, but particularly optimal extension fields, this method is the most efficient
algorithm known to date. We will address some specifics of the algorithm a bit later
and refer the reader to [GP02] for full details.
7.2 Robust Block Codes
A class of non-linear systematic error detecting codes, so-called “robust codes”, were
proposed by Karpovsky and Taubin [KT04]. They can achieve optimality according
72 CHAPTER 7. LOW COST TECHNIQUES
to the minimax criterion, that is, minimizing over all (n, k) codes the maxima of the
fraction of undetectable errors Q(e) for e 6= 0. The following definition from [KT04]
rigorously defines a particular class of non-binary codes.
Definition 7.2.1. Let V be a linear p-ary (n, k) code (p ≥ 3 is a prime) with
n ≤ 2k and check matrix H = [P |I] with rank(P ) = n − k. Then CV = {(x,w)|x ∈
GF (qk), w = (Px)2 ∈ GF (qr)}
To quantify the performance of CV we need a metric. The error masking proba-
bility for a given non-zero error e = (ex, ew) may be quantified as
Q(e) =
|{x|(x+ ex, w + ew) ∈ CV }|
|CV | . (7.1)
Note that we call the code CV robust, if it minimizes maxima of Q(e) over all possible
non-zero errors. Reference [KT04] provides the following theorem which quantifies
the error detection performance of the non-linear code CV .
Theorem 7.2.1. For CV the set E = {e|Q(e) = 1} of undetected errors is a (k −
r)-dimensional subspace of V , pk − pk−r errors are detected with probability 1 and
remaining pn − pk errors are detected with probability 1− p−r.
These codes achieve total robustness for the case r = k, when the subspace of
undetectable errors collapses to the zero codeword and all non-zero error patterns
can be detected with a probabilty of either 1 − p−r or 1. The reduced overhead for
the case r < k is obtained at the expense of the loss of total robustness. Nonetheless,
some important properties of robust codes are retained: Contrary to linear encoding
schemes, the probability of missing an error is largely data-dependent. This has one
very important consequence. An active adversary trying to induce an undetected
error in the data would need to a) know the value of the data a priori in order
to compute an undetectable error pattern, and b) to induce a fault with sufficient
spatial and temporal accuracy such as to successfully re-create the matching error
7.3. SUITABLE CODES FOR OEF ARITHMETIC 73
in the device. In a linear scheme, any error pattern that is a codeword itself will
lead to a successful compromise. Although there also exist some error patterns in the
robust scheme which will escape detection, their number is significantly smaller than
in linear schemes.
7.3 Suitable Codes for OEF Arithmetic
Our objective is to protect OEF arithmetic against a sufficiently large class of er-
ror patterns, while keeping the overhead in performance low. As mentioned in the
previous section, full robustness can only be achieved with r = k, resulting in a du-
plication of the operand size and likely more than 100% overhead. Furthermore, as
we will show in the next section, only very specific choices of the sub-matrix P allow
us to define arithmetic operations in such a way that the check-symbol of the result
can be predicted efficiently based on the input operands’ check-symbol.
An alternative method for robust multi-precision arithmetic was presented in
[GSK06], which achieves total robustness by encoding single digits separately. Such
a high degree of protection, however, can only be obtained by accepting a substantial
amount of overhead on the predictor and error detection networks. Depending on the
anticipated threat level, such rigor may not be required. As long as the probability
of error detection is sufficiently high to render malicious fault insertions infeasible, a
lighter-weight scheme will work as well.
For our purposes we need a linear code over GF (q) with rank(P ) = n− k, which
we will transform into a robust code. The error correcting properties and ease of
decoding are irrelevant in this setting. For our purposes it suffices to select a simple
linear code which will allow us to build robust versions of the arithmetic operations
(as described in the next section) in an efficient manner. We therefore pick a simple
parity code of length n = k + 1. Note that by choosing r = 1 the r × n error
74 CHAPTER 7. LOW COST TECHNIQUES
check matrix H = [P |Ir] becomes simply H = [1 1 1 · · · 1 1]. Hence, for a vector a
representing an element of GF (qk), the matrix-vector product Pa may be computed
by simply summing the coefficients of a, i.e. Pa =
∑k−1
i=0 ai. Thus the robust encoded
form of a ∈ GF (qk) is simply
(
a, (Pa)2
)
=
< a0, a1, a2, . . . , ak−1 >,(∑
k
ai
)2
7.4 Robust OEF Arithmetic
To build an error detection network we need to provide robust realizations of the basic
OEF arithmetic operations, i.e. addition and multiplication. We wish to minimize
the overhead associated with computing and verifying the check-symbol. Hence, we
choose the linear code such that P maps elements of GF (qk) to GF (q). Note that
field inversions and elliptic curve point operations are implemented through various
compositions of these basic operations and hence there is no need to consider them
here explicitly.
The general architecture for implementation of robust OEF arithmetic is depicted
in Figure 7.1 and can be described as follows. A robust arithmetic operation consists
of three major components: a) the main data-path, b) the check-symbol predictor
and c) the error detection network (or, in the case of software implementation, the
error detection procedure).
Main Data-Path: The basic arithmetic operation which computes the result c =
c0 + c1x+ c2x
2 + . . .+ ck−1xk−1 without the check-symbol.
Predictor: Computes the expected check-symbol from input data. Whenever pos-
sible only the input operands’ quadratic check-symbols shall be used, i.e. (Pa)2
or (Pb)2. When this is not feasible, a linear check-symbol Pa or some portion
of the operands’ data may be used as well.
7.4. ROBUST OEF ARITHMETIC 75
Result and Check−Symbol
0,a1,a2, . . .,ak−1 b0,b1,b2, . . .,bk−1 (Pa) (Pa)2, (Pb) (Pb)2,
c0,c1,c2, . . .,ck−1 (Pc) (Pc)2,
(Σ)2 =?
Error
Detection
Network
Non−redundant
Data Path
Predictor
Error?
yes/no
Input Operands Input Check−Symbols
a
Figure 7.1: General Error Detection Architecture
Error Detection Network: Computes the actual check-symbol (Pc)2 from the re-
sult of the operation and compares it to the output of the predictor. If both
match, the result is deemed correct. If not, an alarm is raised and the opera-
tion is aborted. In hardware this error detection network is typically assumed
to be free of errors and thus may need to be implemented as a self-checking
architecture.
The intermediate result Pc that is computed in the error detection network as a
by-product of the check-symbol should be saved as part of an extended check-symbol,
since in many cases it is required for efficient realization of predictors, as we will see
in the following sections.
76 CHAPTER 7. LOW COST TECHNIQUES
7.4.1 Initial Encoding and Detection Network
Operands entering the device are typically not encoded in redundant form. We require
an initial encoding operation to establish a relation between the operand and its
(extended) check-symbol. As mentioned above, there is a clear benefit to retaining
the intermediate linear sum of coefficients as part of the extended check-symbol. We
thus represent encoded operands in the extended form
A =
[
< a0, a1, . . . , ak−1 >,Pa, (Pa)2
]
.
Initial encoding of an operand a and integrity checks of intermediate results can both
be realized in the same way. It requires k − 1 additions in the ground field GF(q)
to produce the linear sum Pa, followed by a squaring which yields the non-linear
check-symbol (Pa)2. The only difference lies in the subsequent comparison with a
predicted check-symbol, which is absent in the initial encoding.
7.4.2 Addition
Given two encoded inputs A = (a, Pa, (Pa)2) and B = (b, P b, (Pb)2) we want to
compute S = (s, Ps, (Ps)2) such that s = a + b ∈ GF (qk). This is done by first
computing the component-wise addition of the two field elements to form the sum
s = a+ b, then computing the associated check-symbol ws = (Ps)
2.
However, since we want to protect against faults that may occur during the
computation, we need a reference value that predicts the value of the new check-
symbol. The output of this predictor should ideally be based exclusively on the input
operands’ check-symbols, but intermediate values obtained during their computation
may be used as well. By using the extended check-symbols for a and b, Pa, (Pa)2
7.4. ROBUST OEF ARITHMETIC 77
and Pb, (Pb)2, we can predict the value of the sum’s check-symbol (Ps)2 as follows.1
(Ps)2 = (P (a+ b))2 = (Pa)2 + 2(Pa)(Pb) + (Pb)2
Since the computation of (Ps)2 takes place entirely in the ground field GF (q), it
is relatively inexpensive compared to the addition in GF (qk), thereby adding just a
minor amount of overhead. In the final step of the computation the error detection
network compares the output of the predictor to the actual check-symbol (Ps)2 of
the sum s that was computed by summing the coefficients si and squaring the result.
If they match, then the result (s, Ps, (Ps)2) is considered free of errors and passed on
to the next stage of the computation. In case of a mismatch an alarm signal is raised.
7.4.3 Multiplication
The check-symbol of the product c = ab ∈ GF (qk) is computed directly as Pc, (Pc)2.
The predicted check-symbol needs to be computed as
(Pc)2 = (P (ab))2 .
Normally we would be stuck at this point, since in general multiplication by matrix
P does not associate with the multiplication operation in GF (qk). Note, however,
that due to our particular choice of P the check-symbol of a is explicitly computed
as
(∑k−1
i=0 ai
)2
. Furthermore, observe the identity
(Pa)(Pb) =
(
k−1∑
i=0
ai
)(
k−1∑
i=0
bi
)
=
∑
i,j=0,1,...,k−1
aibj . (7.2)
1Here we are abusing the notation slightly by using juxtaposition for both matrix-vector prod-
ucts as well as for multiplication in GF (qk). Also we use vector and polynomial representations
interchangeably.
78 CHAPTER 7. LOW COST TECHNIQUES
On the other hand, assuming a modulus of f(x) = xk−w is used for the construction
of GF (qk), the actual value of the computed check-symbol for ab mod f(x) is
(Pc)2 = (P (ab))2 =
( ∑
0≤i+j<k
aibj +
∑
k≤i+j
waibj
)2
=
( ∑
0≤i,j<k
aibj +
(
(w − 1)
∑
k≤i+j
aibj
))2
After applying the identity from (7.2) and binomial expansion we obtain
(Pc)2 = (Pa)2(Pb)2 +
(
(w − 1)
∑
k≤i+j
aibj
)[
2(Pa)(Pb) +
(
(w − 1)
∑
k≤i+j
aibj
)]
which can be written in a more compact way as
(Pc)2 = (Pa)2(Pb)2 + TM [2(Pa)(Pb) + TM ] .
Hence, we can build a predictor with little cost. The key is to compute the term
TM = (w − 1)
∑
k≤i+j aibj efficiently. The summation may be realized by collecting
the bj terms that are multiplied by the same ai, i.e. a1 is multiplied with bk−1, a2 with
(bk−1+bk−2), . . . , ak−1 with (bk−1+bk−2+. . .+b1). Together with the multiplication by
the constant (w− 1) the computation requires k multiplications and (k+1)(k− 2)/2
additions in GF (q). In a digit serial architecture, however, we can compute even
more efficiently. Note how the collections of bj terms can be expressed as a partial
sums xi = xi−1 + bk−i for 0 < i < k and x0 = 0. Thus it suffices to simply compute
the current xi from a buffered previous value xi−1 and accumulate the products aixi
as depicted in Figure 7.2. This reduces the complexity to k multiplications and
2(k − 2) additions. The overall complexity for a digit-serial predictor is thus k + 3
multiplications and 2k − 2 additions in the ground field GF (q).
7.4. ROBUST OEF ARITHMETIC 79
M
bk−1...b1
a1...ak−1
T
xi−1
aixiΣ
i=1
k−1
xi
w−1
Figure 7.2: Data-flow Graph for the Digit-Serial Computation of TM
7.4.4 Squaring
It is generally known that one can compute the square of a field element more effi-
ciently than a general multiplication, by taking advantage of the symmetry of partial
products [Par00]. For the computation of the check-symbol we can apply the same
general procedure as for multiplication, only that now both operands are the same.
This opens the door to further optimization:
(Pc)2 = (P (a2))2 =
( ∑
0≤i+j<k
aiaj + w
∑
k≤i+j
aiaj
)2
=
(
(Pa)2 +
(
(w − 1)
∑
k≤i+j
aiaj
))2
=
(
(Pa)2 + TS
)2
As one can easily see, not only is the check-symbol computation shorter, but just like
with general squaring, we can take advantage of partial product symmetries in the
computation of the value TS. This symmetry is twofold. Since the partial product
aiaj is the same as ajai, we only need to compute it once for i < j. Once accumulated,
we double the temporary result with a shift to the left, add the sum of the squares
80 CHAPTER 7. LOW COST TECHNIQUES
a2i and multiply by w − 1, as follows.
TS = (w − 1)
2 ∑
k≤i+j,i<j
aiaj +
∑
i>k/2
a2i

The second symmetry is due to the possibility of grouping partial products into
products of sums, i.e. (a1 + ak−2)ak−1, (a2 + ak−3)(ak−1 + ak−2), (a3 + ak−4)(ak−1 +
ak−2+ak−3), and so forth. Here the first sum consists of two coefficients ai and ak−1−i
whose indices are symmetric about (k − 1)/2. Only if k is odd, the last iteration is
the product of the single term abk/2c and the sum (ak−1 + ak−2 + . . .+ abk/2c+1). This
grouping of terms can be easily translated into a digit-serial computation involving
only k multiplications and 2k−5 additions in GF(q). This brings the total complexity
for computing the check-symbol of a squaring to k + 1 multiplications and 2(k − 2).
7.4.5 Multiplication by Scalar
The multiplication of a field element a ∈ GF (qk) by a scalar, i.e. a single ground field
element c ∈ GF (q), is required in various situations arising in elliptic curve cryp-
tography. It is, for example, required in modular reduction and as part of inversion
operations based on the extended euclidean and similar algorithms. Further it serves
as the foundation for simple intra-coefficient shifting, i.e. multiplication by a scalar
which is a power of two. The coefficients of the field elements all have to be multiplied
by the same scalar ca = ca0 + ca1x+ ca2x
2 + . . .+ cak−1xk−1. The predictor output,
however can be computed very easily:
(P (ca))2 =
(
k−1∑
i=0
cai
)2
= c2(Pa)2
All that is needed to compute the new check-symbol, therefore, is one squaring and
one multiplication operation in GF (q). If the scalar is an operand in memory, i.e. not
hard-wired or -coded, then in all likelihood it is already protected by a check symbol
7.4. ROBUST OEF ARITHMETIC 81
(Pc)2 = c2, and thus the predictor requires only a single multiplication. This is the
case, e.g. during a step in the Itoh-Tsujii inversion algorithm, where the outcome of
an operation is an element in the ground field, i.e. with all higher degree coefficients
set to zero.
7.4.6 Shifting of Coefficients
While shifting is not (directly) an arithmetic operation it is sometimes needed in the
implementation of arithmetic functions. In many cases, shifting is only needed when
the least significant coefficient of the GF (qk) element has already been made zero
by simply adding the properly scaled version of the modulus polynomial. Note the
following simple yet useful property: As long as the shifted coefficients have a value
of zero the check-symbol for an element a ∈ GF (qk) which is defined as the linear
sum of the coefficients, i.e. (Pa)2 =
(∑k−1
i=0 ai
)2
is not altered. Hence, shifting does
not affect the check-symbol and comes for free.
7.4.7 Frobenius Maps
Iterates of the Frobenius automorphism are of particular interest to OEF arithmetic.
They play an important role in the inversion algorithm that was developed for normal
basis representation by Itoh and Tsujii [IT88] and later adapted for use with standard
basis by Guajardo and Paar [GP02]. The e-th iterate of the Frobenius map on the
field element a =
∑
k aix
i computes the exponentiation σe(a) = aq
e
in a very efficient
manner. This efficiency is based on the fact that OEFs are defined over irreducible
binomials of the form f(x) = xk−w. Note that exponentiation to a power of the field
characteristic q, yields a much simpler expression for the result than for the general
case:
aq
e
=
(∑
i<k
aix
i
)qe
=
∑
i<k
aix
iqe
82 CHAPTER 7. LOW COST TECHNIQUES
Reduction of this polynomial with respect to f(x) gives the result
aq
e
=
∑
i<k
aiw
biqe/kcxiq
e mod k =
∑
i<k
aiw
txs .
Observe that for a given parameter e the values wt and xs are constant and can be pre-
computed. Furthermore, note that the value of s is unique for each i = 0 . . . k−1, and
we may express the resulting re-ordering of coefficients as a permutation a′ = pie(a),
defined by the indexing function se(i) = iq
e mod k. Hence, we can view the e-th
iterate of the Frobenius automorphism σe(a) as the component-wise multiplication
(indicated here by the operator symbol ¯) of field element a with a pre-computed
element d(e) = d
(e)
0 +d
(e)
1 x+ . . .+d
(e)
k−1x
k−1, followed by a permutation of the resulting
coefficients:
c = σe(a) = pie(a¯ d(e)) (7.3)
= a0d
(e)
0 + a1d
(e)
1 x
se(1) + a2d
(e)
2 x
se(2) + . . .+ ak−1d
(e)
k−1x
se(k−1) (7.4)
= c0 + c1x+ . . .+ ck−1xk−1 (7.5)
We now want to incorporate Frobenius maps into our error detection scheme. For the
sake of a less confusing notation we will from now on refer to d(e) and its coefficients
simply as d. Observe that we only need to build a predictor for the component-wise
multiplication, since permutation of the coefficients is completely transparent and
does not affect the check-symbol. Therefore we only need to predict the check-symbol
(Pc)2 = (P (a ¯ d))2 = (∑k−1i=0 aidi)2. Unfortunately, we cannot simply multiply the
check-symbols (Pa)2 and (Pd)2 to do so. Rather take a subtractive synthesis approach
by first computing the product (Pa)(Pd) =
∑
i,j<k aidj and then subtracting a term
TF =
∑
i6=j aidj. Thus the predicted check-symbol is computed as follows:
(Pc)2 = (Pa)2(Pd)2 − TF (2(Pa)(Pd)− TF ) (7.6)
We will now demonstrate how to efficiently compute TF . Again, observe the identity
from (7.2). We may visualize the partial products aidj as a k × k matrix, with
7.4. ROBUST OEF ARITHMETIC 83
(Pa)(Pd) being the sum of all entries. The check-symbol of the component-wise
multiplication ai ¯ di is the square of the sum of all entries on the main diagonal.
Thus TF is the sum of all matrix coefficients other than the main diagonal, in other
words, the upper and lower triangular sub-matrices. We can compute these partial
products and their summation efficiently by grouping expressions with common terms.
For example, the upper triangular sub-matrix can be computed as the summation of
a1d0, a2(d0 + d1), a3(d0 + d1 + d2), etc. Similarly, we can compute the lower sub-
matrix in the exact same manner, only with the roles of a and d reversed. Figure 7.3
shows the data-flow graph for computing TF in a digit-serial fashion. This operation
requires 2(k − 1) multiplications and 4(k − 2) + 1 additions in GF(q). Together with
the operations in (7.6) the total cost of prediction is 2k+1 multiplications and 4(k−1)
additions.
i
ak−1...a0
bk−1...b0
TF
bixixi=xi−1+ai−1
yi=yi−1+bi−1 aiy
Figure 7.3: Data-Flow Graph for the Digit-Serial Computation of TF
7.4.8 Further Notes
Some remarks regarding our assumptions for the implementation of arithmetic: We
did not include fast multiplication algorithms like that of Karatsuba and Ofman[KO63]
in our complexity estimate, since their performance advantage depends on the latency
ratio of multiplication and addition operations. If both operations issue in a single
84 CHAPTER 7. LOW COST TECHNIQUES
clock cycle, as is the case for many modern RISC based processors, the increased cost
of more complicated control flow and irregular memory access patterns is not offset by
the little performance advantage[Har04]. The lower asymptotic complexity of KOA
mainly pays off for large operand sizes, as in RSA or Diffie-Hellman cryptosystems,
but not so much for the short operands used in elliptic curve schemes. Despite being
the slowest algorithm, Schoolbook multiplication is also relatively easy to implement.
7.5 Software Implementation on Embedded Pro-
cessors
The motivation for elliptic curve cryptosystems over optimal extension fields stems
primarily from the relatively weak performance of binary polynomial and large prime
field arithmetic in software based implementations. Most general purpose processors
do not support the arithmetic primitives that are required for efficient implementa-
tion, most importantly fast polynomial multiplication over GF(2). Instruction set
extensions that would help overcome these limitations have been proposed in [GS04],
but are unlikely to see implementation in mainstream processors in the near future.
OEF based implementations achieve a major speed-up due to the fact that the
size of field element coefficients and their arithmetic operations can be mapped near
perfectly to the native word size and ALU capabilities of microprocessors. The restric-
tion of coefficients to a modulus smaller than or equal to the wordsize also obviates
the need for elaborate carry-handling mechanisms as encountered in typical multiple
precision arithmetic.
The extension field operations we considered in the previous section can be real-
ized almost exclusively by using elementary ground field arithmetic, i.e. addition and
multiplication in GF(q). We will consider complexity only in terms of these basic
operations, since the individual implementation details are architecture dependent.
7.5. SOFTWARE IMPLEMENTATION 85
Table 7.1 lists the complexity of OEF operations for standard implementation, plus
the additional overhead required for robust implementation. Naturally these expres-
sions depend on the field extension degree k, while the size of operands in the ground
field is irrelevant in most cases, as long as they fit in the native word size of the
processor.
Table 7.1: Complexity of Various Extension Field Operations
Standard Implementation
Field op. #Muls in GF(q) #Add/Subs in GF(q)
Addition/Subtr. - AA = k
Multiply† MM = k2 + k − 1 AM = k2 − k
Square† MS = k(k + 3)/2− 1 AS = k(k − 1)/2
Scalar Product MP = k -
Frobenius Map MF = k -
Fermat Inverse∗ MFI = (δ − 1)MS + δ2MM AFI = (δ − 1)AS + δ2AM
Itoh-Tsujii Inverse MITI = ΓMM +∆MF +MP MITI = ΓAM
Robustness Overhead†
#Muls #Adds
Addition/Subtr. MRA = 1 ARA = 2
Multiply† MRM = k + 4 ARM = 2k − 2
Square† MRS = k + 1 ARS = 2k − 4
Scalar Product MRP = 1 -
Frobenius Map MRF = 2k + 1 ARF = 4(k − 1)
Fermat Inverse∗ MRFI = (δ − 1)MRS + δ2MRM ARFI = (δ − 1)ARS + δ2ARM
Itoh-Tsujii Inverse MRITI = ΓMRM +∆MRF +MRP ARITI = ΓARM +∆MRF
∗Based on Fermat’s Little Theorem, δ = dlog2(qk − 2)e
†Overhead for predictor. Overhead for detection is 1 Mul., k − 1 Adds. per Oper.
Γ = blog2(k − 1)c+HW(k − 1)
∆ =

Γ if k is odd
log2 k if k is a power of 2
blog2(k − 1)c+HW(k)− 1 otherwise.
86 CHAPTER 7. LOW COST TECHNIQUES
7.6 Analysis of the Performance Overhead in ECC
Point Multiplication
In this section we show how robust implementations of OEF cryptosystems on embed-
ded processors can be realized efficiently with a minimum of performance overhead.
We will base all of our performance analyses on the basic elliptic curve scalar point
multiplication, as this is the central and most complex operation of all ECC based
protocols. Based on the complexity analysis of OEF arithmetic in standard and
robust implementations in the previous section, we want to derive robust point mul-
tiplication overhead estimates for a couple of example fields. We begin by breaking
down the number of extension field operations required for a given set of parameters.
Table 7.2: Overhead of Robust OEF Arithmetic for ECC Point Multiplication
Field extension k 5 7 11 14 17
#MULs 40.27% 26.05% 14.88% 11.15% 8.89%
#ADDs 84.71% 63.44% 42.02% 33.49% 27.84%
#Instructions 62.49% 44.74% 28.45% 22.32% 18.37%
Table 7.3: Probability of Missing an Error in Robust OEF Arithmetic
size of ground field log2 q 45 32 21 16 14
k 5 7 11 14 17
Pr[e|Q(e) = 1] < 2−89 < 2−63 < 2−41 < 2−31 < 2−27
Pr[e|Q(e) = 0] < 2−44 < 2−31 < 2−20 < 2−15 < 2−13
Pr[e|Q(e) = q−1] ≈ 1, q−1 < 2−44 < 2−31 < 2−20 < 2−15 < 2−13
An elliptic curve over GF(qk) is specified by its defining equation y2 = x3+ax+b,
and the parameters a, b ∈ GF(qk). In this dissertation, as in [BHLM01], we will
fix one parameter a = −3, since it allows faster arithmetic without sacrificing much
7.6. PERFORMANCE OVERHEAD 87
flexibility. Let s ∈ Z2` be an `-bit scalar value with ` ≈ k log2 q, and P a base point
represented in affine coordinates by the tuple (xP , yP). Furthermore let Q be the
point resulting from multiplication of P with the scalar s, represented in Jacobian
projective coordinates by the triple (xQ, yQ, zQ). We opted to make use of mixed
Jacobian-affine coordinate representation for reasons of efficiency and popularity, but
note that other coordinate systems may be used in the same way.
Scalar point multiplication consists of repeated applications of point addition,
subtraction and doubling algorithms. For a detailed description of these algorithm
we refer the reader to [HMV04]. By representing s in non-adjacent form (NAF), a
scalar point multiplication requires `−1 point doublings and on average `/3 point ad-
ditions/subtractions. The NAF repeated-doubling algorithm, however, is susceptible
to sign-change fault attacks [BOS06]. We therefore recommend the use of a side-
channel resistant algorithm like the modified Montgomery-Ladder algorithm [FV06].
It requires 2(`− 1) point doublings and `− 1 point additions. We refer to [BHLM01]
for the operation count of point doubling (4 multiplications, 4 squarings, 5 addi-
tions/subtractions, and 4 multiplications with a scalar2) and point addition and/or
subtraction (8 multiplications, 3 squarings, and 7 additions/subtractions), without
going into too much detail. Conversion from Jacobian to affine coordinates at the
end of a point multiplication requires a single field inversion, one squaring and three
multiplications.
These instruction counts allow us to analyze the overall performance impact that
robust implementation of OEF arithmetic has on the performance of elliptic curve
point multiplication for various parameters, listed in Table 7.2. Since the relative
overhead only depends on the extension degree k, the table does not include the size
of q. The robustness of the scheme, i.e. the error detection probability, on the other
hand depends exclusively on the size of the ground field GF(q), a relationship that
2By scalar multiplication we refer to an operation M : GF(q)×GF(qk) 7→ GF(qk).
88 CHAPTER 7. LOW COST TECHNIQUES
is illustrated in Table 7.3. We would like to emphasize briefly that the security of an
elliptic curve system relies largely (but not exclusively!) on the size of the finite field
over which it is defined, as long as one can find a curve of prime order, approximately
the size of the field. The relatively large number of OEF parameters for field sizes
of practical interest [BP01], thus provides us with a straight-forward mechanism for
trading off the level of robustness and the acceptable performance overhead, without
sacrificing cryptographic strength. For example, the following two OEFs are 224 bits
in size: GF((216 − 15)14) and GF((232 − 5)7). The overhead for robustness with the
former field is around 10% in the number of GF(q) multiplications and 17% in GF(q)
additions, while it is 21% and 33% for the latter. By the same token, the chance of
missing an error in an operand of GF((232 − 5)7) with non-negligible probability is
approximately 232 times smaller than that in the field with higher extension degree,
and missing a random error is 216 times less likely.
7.7 Discussion
We have presented a strong scheme for error detection in optimal extension field arith-
metic, based on the theory of robust codes. Such schemes are necessary in defeating
ever new attack vectors on cryptographic hardware that keep being developed. Spe-
cial emphasis has been placed on keeping the overhead of error detection low and
tolerable for performance critical applications. The method is easily applicable to
both software and hardware implementations, and allows an easy trade-off between
error detection overhead and robustness.
Chapter 8
Tamper Resilient Control
Structures
In the previous chapters we focused exclusively on the protection of the data-path of
cryptographic circuits. Like any other computing machinery, however, these circuits
do not consist exclusively of a data path, but also of control structures like state
machines. Since any system is only as secure as its weakest component, it is impor-
tant to not leave Achilles’ proverbial heel uncovered and protect the control logic of
cryptographic circuits against fault attacks, as well.
Motivated by a hypothetical, yet realistic, fault analysis attack that in principle
could be mounted against any modular exponentiation engine, even one with appro-
priate data path protection, we set out to close this remaining gap. In the following
sections we present guidelines for the design of multi-fault resilient sequential control
logic based on standard Error Detecting Codes (EDC) with large minimum distance.
We introduce a metric which measures the effectiveness of the error detection tech-
nique in terms of the effort the attacker has to make in relation to the area overhead
spent in implementing the EDC. Our comparison shows that the proposed EDC based
technique provides superior performance when compared against regular N-modular
89
90 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
redundancy techniques. Furthermore, our technique scales well and does not affect
the critical path delay.
8.1 Introduction
The goal of this chapter is to provide the control logic of a cryptographic circuit
with a level of protection that is on par with other mechanisms protecting its data
path. As stated in Chapter 4 we consider an adversarial situation in which we want
to protect a system under worst-case conditions, rather than maximizing average
case reliability. As such it is important to provide each and every component of
the system with a uniform level of resilience against attacks. A secondary goal is to
quantify this resilience and provide some practical design considerations. We would
like to briefly remark on the issue of complexity and cost of implementation: compared
to the data path the control logic tends to be fairly small in size. The overhead is
therefore secondary to the improvement in fault-resilience and in most cases even
irrelevant. That said, we will show in Section 8.5 that our solution exhibits much
better performance when compared to traditional fault tolerant techniques of the
same resilience level. Specifically, the contributions contained in this chapter may be
summarized as follows:
• We discuss how a hypothetical attack targeting the control unit can be mounted
utilizing an adversarial error model that was defined in Chapter 4.
• We define a useful metric for measuring the effectiveness of the error detection
technique in terms of the effort the attacker has to make to mount a successful
attack with respect to the area overhead spent in building the EDC.
• We provide a detailed analysis of the complexity of the proposed EDC and show
that it provides superior error detection capabilities when compared against tra-
8.2. MOTIVATION 91
ditional modular redundancy based techniques, while not affecting the critical
path delay.
• We provide evidence that our technique provides better scalability compared to
traditional modular redundancy based techniques.
This chapter is structured as follows: Section 8.2 contains an example of a fault
attack on the control circuit of a hypothetical modular exponentiation based public
key accelerator. Section 8.3 will introduce notation and definitions of sequential
circuits, as well as classes of faults. State encoding schemes based on error detecting
codes (EDC) are introduced in Section 8.4 and used in Section 8.5 to design and
analyze fault resilient sequential circuits, complete with an example. Section 8.6
quantifies the error detection capabilities of various linear codes and explains the
importance of design diversity to counter certain classes of faults.
8.2 Motivation
We will motivate our research with an example attack scenario. We will demonstrate
the theoretical feasibility of an attack on a slightly simplified public key cryptographic
accelerator as it might be found in a ubiquitous security device such as a smart card.
For sake of simplicity let us assume a basic modular exponentiation algorithm without
CRT, using the secret private key, e.g. an RSA decryption or signature generation.
It should, however, be straightforward to adapt the attack to a CRT-based imple-
mentation. We consider a regular standard-cell ASIC implementation with sufficient
protection of the data path, but not the control logic, along with countermeasures to
prevent timing and power analysis (balanced power consumption, constant run-time).
We furthermore assume that the attacker has the ability to unpack the chip and in-
duce bit-flips on the state registers with some temporal precision [SA02], even though
it is our understanding that tamper-proof coating and/or packaging has become some-
92 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
thing of a standard practice among smart card manufacturers. The work of Anderson
and Kuhn [AK96] gives reason to question the effectiveness of such measures against
determined attackers.
A particular algorithm that meets the requirements set above is Joye and Yen’s
[JY02] variant of the Montgomery Ladder (Alg. 8.1), computing y = xe mod N .
The state machine in Figure 8.1 implements the algorithm on a data path with a
single multiplier and three registers R0, R1 and R2. Each register is n-bits wide; the
size of the modulus N . This highly simplified state machine consist of seven states:
the six active states, which are listed in Algorithm 8.1 right besides their respective
instruction, and the idle state in which the circuit is inactive. The initial state after
activation of the circuit sets up constants and counters. The two load states read the
variables x and e via the input bus, while the result state returns the computed value
y on the output bus. The multiply and square states are self-explanatory from the
algorithm description. The control signals a and b for selecting between registers R0
and R1 are determined by scanning the exponent in R2 bit-wise, as indicated by the
superscript bit-index in parentheses.
The complete public-key accelerator circuit consists of a modular multiplication
unit, a memory block for storage of operands and temporary results, as well as the
exponentiation state machine which coordinates the movement of data between multi-
plier and memory. The multiplier can also be used for squaring operations. Although
we will mainly focus on the state machine itself, we want to give a brief overview
of the multiplier component, for a more complete picture. It is a pipelined, scal-
able Montgomery multiplier as described in [Gau02], which can perform arithmetic
on both integer and binary polynomial operands. Scalable here refers to the ability
to perform arithmetic with operands of nearly arbitrary precision only limited by
the amount of memory, as well as the design time configurability of parameters like
the number of pipeline stages and data path width. The multiplier computes the
8.2. MOTIVATION 93
Idle Init
Load2
Multiply
Square
Load1
"001" "010"
"011"
"100"
"101"
"110"
"111"
Result
start=1
start=0
count < 0count    0≥
count    0≥
Figure 8.1: State Transition Diagram for Montgomery Ladder with Attack
94 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
Algorithm 8.1 Joye and Yen’s Montgomery Ladder Exponentiation [JY02]
Require: x, e
Ensure: y = xe mod N
R0 ← 1 INIT
R1 ← x LOAD1
R2 ← e LOAD2
for i = n− 1 downto 0 do
a← R(i)2 ; b← ¬a
Rb ← Rb · Ra mod N MULTIPLY
Ra ← R2a mod N SQUARE
end for
y ← R0 RESULT
Montgomery product C = A ·B ·R−1 using the word-level “finely integrated operand
scanning” (FIOS) algorithm, which consists of two nested loops. To reflect a typical
application scenario in an embedded system, we chose to implement a rather small
variant of the multiplier with an 8-bit data path and three pipeline stages. Each stage
consists of two 8×8-bit parallel multipliers and two adders, and can process one com-
plete inner loop of the algorithm. Intermediate results are computed LSW first and
can be passed from stage to stage. Delay registers with bypass-logic maintain proper
scheduling of data words between the pipeline stages. Each pipeline stage has its own
inner loop state machine, and the scheduling of each stage is controlled by a single
outer loop state machine. Since typically there are fewer pipeline stages than outer
loop iterations, a FIFO stores the result of the last stage and passes it back to the
first stage when it becomes available again. Table 8.5 in Section 8.5 gives the exact
area breakdown of the multiplier components.
We would like to point out that the multiplier currently has no fault detection
or other side-channel attack countermeasures. We use it mainly to demonstrate the
8.2. MOTIVATION 95
rather small percentage of control logic in relation to the overall area of the design.
Therefore the total increase in area will stay relatively small, even if the error detection
mechanism in the control logic adds significant overhead. This will become visible
later on in the synthesis results of our example.
Our main argument, however, is the following: Even if the data path of the circuit
were protected by attack countermeasures, the system would still be vulnerable to
fault attacks, unless the state machine is protected by an error detection mechanism.
The following is a brief outline of such a hypothetical attack, which reveals the n-bit
secret exponent e in short time:
1. Measure the total time of an (arbitrary) exponentiation and determine the com-
putation time required per exponent bit: tbit ≈ ttotal/n.
2. Set the round index k = 1, select a known ciphertext x and set the result of the
previous round as (R0,R1) = (1, x).
3. Compute the triple (A,B,C) = (R20,R0 ·R1,R21) mod N and make a hypothesis
H0 about the exponent bit e
(n−k) ∈ {0, 1}.
4. Start exponentiating x.
5. After k× tbit time steps, force the state machine into the state 111:RESULT, by
inducing a fault either on the least significant bit of the state or on the input
signal count < 0. This will cause the circuit to reveal its intermediate result
in R0.
6. Verify the hypothesis by comparing the actual value of R0 to A. If it matches
(e(n−k) = 0), set (R1,R0) = (A,B), else (e(n−k) = 1) set (R0,R1) = (B,C).
7. Repeat steps 3 through 6 for all other bits of the exponent (k = 2 . . . n).
96 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
This attack works because there is no way to distinguish between a valid or a faulty
transition (cf. Figure 8.1: solid vs. dashed arc) from the 110:SQUARE state to the
final state 111:RESULT. Without a doubt, an actual attack on real hardware will
probably be much more involved and not as simple as described here in our na¨ıve
scheme. Our basic point is, however, that the control logic of any cryptographic
circuit should be identified as a potential point of attack. This is especially true,
if other countermeasures against fault attacks on the data path are put into place,
since the control logic becomes the weakest element in the system and as such an
obvious target. To this end we propose an all-encompassing strategy to protect all
the components of embedded cryptographic devices with a certain minimum level of
guaranteed resilience to fault attacks.
8.3 Preliminaries and Definitions
A finite state machine can be formally defined by a six-tuple (S, I, O,∆,Λ, s0). S is
the set of valid states (the state space), which has dimension k = dlog2 |S|e, and s ∈ S
denotes the current state. In a similar manner I and O define the input and output
sets, i ∈ I and o ∈ O the current inputs and outputs. The number of input and
output signals are ι = dlog2 |I|e and ω = dlog2 |O|e bits. The sets ∆ and Λ represent
the next state and output logic functions, which take s and i as their arguments.
They consist of a collection of boolean functions δj(s, i) and λl(s, i), each of which
compute a single bit in the next state or output vector. Finally, s0 indicates the initial
state. For simplicity and without loss of generality we will assume that all 2k states
are valid throughout the remainder of this article. In the following we will refer to
state machines that encode the state in a vector with n > k bits as redundant state
machines.
A regular (read: non-fault resilient) state machine, Figure 8.2, encodes the current
8.3. PRELIMINARIES AND DEFINITIONS 97
Λ
i     I
∈o     O
δ1
δ0 λ0
λω−1
λ1
s     S∈
St
at
e 
Re
g.
k
ι
∆
.
 
.
 
.
δk−1
.
 
.
 
.
∈
Figure 8.2: Non-redundant State Machine
state as a k-bit binary vector s = (s0, s1, . . . , sk−1). In linear algebra terminology
the state space S is a vector space over the field GF(2), spanned by a set of k
linearly independent basis vectors ν0 . . . νk−1. The current state is a linear combination
s = s0ν0 + s1ν1 + . . . + sk−1νk−1, with addition in GF(2) being equivalent to an
exclusive OR operation. In the simplest case the basis vectors form a standard basis
with orthogonal unit vectors:
ν0 = (1 0 . . . 0)
ν1 = (0 1 . . . 0)
...
...
...
. . .
...
νk−1 = (0 0 . . . 1)
The next state s′ is determined by evaluating the set of next state functions:
s′ =∆k(s, i) = (δ0(s, i), δ1(s, i), . . . , δk−1(s, i))
Similarly, the ω-bit output vector o is computed by evaluating the set of output logic
functions Λω(s) = (λ0(s), λ1(s), . . . , λω−1(s)). At each clock-edge the current state
vector takes on the value of the next state vector s = s′. Mealy type state machines
are different only in that their output functions also depend on the input vector. In
98 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
the following we will focus on Moore type machines, but the same principles also
apply to Mealy machines.
8.3.1 Attacker Model
In the following we will assume that an attacker is somehow able to induce a fault
into the device that results in the flipping of a bit. The attacker has control over
the location of the bit flip, but there is a cost (or effort) attached, which increases
with the number of bit flips. For example, if the attacker tries to change the state of
a circuit by attacking the state register such that s˜ = s ⊕ e, the cost of the attack
depends on the hamming weight of the induced error pattern e.
8.3.2 A Novel Effectiveness Metric: Attack Effort per Area
We need a metric by which to compare the effectiveness of various error detection
mechanisms. An intuitive approach is to compare the cost or effort of a successful
fault attack to the cost of preventing it. Before we can define this metric we need to
introduce a couple of further definitions:
Resilience or Resiliency is a measure of how many errors the circuit can with-
stand. More concretely, this is the number of errors that can be accurately
detected by the error detection network. Throughout this chapter we will re-
fer to resilience as the parameter t and to a fault tolerant sequential circuit as
t-resilient.
Fault Insertion Effort is a measure for the difficulty of inducing a fault into the
system that manifests itself as one or more errors. It can not be given in
absolute terms, since an attacker’s effort depends on a large variety of factors.
We will therefore only give the effort in relative terms, compared to the cost for
induction of a single bit-error. There exist multiple types of faults all of which
8.3. PRELIMINARIES AND DEFINITIONS 99
warrant their own effort values. For simplicity we only model single bit faults,
which manifest themselves also as single bit errors at the gate level.
We now define the total attack cost as the sum of the total effort required for a
successful attack. We arbitrarily assign a nominal effort of 100% for a single bit error
as the basis of comparison for various countermeasures. If the circuit is more resilient,
then also the effort for a successful attack increases.
Definition 8.3.1. Let Ebf denote the effort required for inducing a change of value
for a single bit in the circuit (bit fault). Let t denote the degree of worst-case fault-
resilience of the circuit, i.e. the maximum number of errors that can be tolerated when
induced simultaneously. The total cost for the attack is defined as C = (t+ 1) · Ebf .
In Definition 8.3.1, we implicitly assume that the fault insertion effort increases
at least linearly with resilience, t. Inducing change of values for more than one
bit in the circuit may not be much harder than changing one bit, especially when
those bits are stored in places close to each other (e.g. several consecutive bits in a
register). However, our proposed scheme necessitates changing a number of certain
bits, which are not particularly close to each other, if the fault induction attack is to
be successful. Many successful fault induction attacks do not necessarily need a high
spatial precision, and inadvertently changing several other bits than the target bits
is acceptable in these attack scenarios. The proposed technique, on the other hand,
forces the attacker to be very precise when changing certain bits lest the attack be
detected. Therefore, changing several bits, not necessarily next to each other in the
circuit, without changing other bits is indeed a daunting task for the attacker. The
exact relation of fault insertion effort to the resilience, t depends largely on the specific
implementation, and therefore is difficult to formulate. Nevertheless, our assumption
of a linearly increasing effort for fault insertion with respect to t may be justifiable at
least as a first order approximation, since it is our intuition that the relation is even
more complex and the effort should increase faster than just linearly.
100 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
Based on Definition 8.3.1 and the area overhead which is required to implement
a specific countermeasure, we can define an effectiveness metric η which allows us to
compare the different approaches. The figure of merit is the ratio between cost and
area. Additionally we define a normalized effectiveness H which can be useful for
comparing countermeasures on circuits with completely different functionality.
Definition 8.3.2. Let A denote the circuit area and C the cost for successfully
implementing a fault attack. We define the effectiveness of the circuit as the ratio
η = C/A between the cost of the attack (in terms of effort) and the cost of the
countermeasure (in terms of circuit area). Let further η0 = C0/A0 be the effectiveness
of the unprotected circuit. The ratio H = η/η0 denotes the normalized effectiveness
of a countermeasure with respect to a particular basis circuit.
In Section 8.5 we will use the effectiveness metric to compare traditional fault
tolerance approaches with our method.
8.4 Redundant State Encoding
Looking at typical finite state machine implementations of cryptographic algorithms,
one may notice that oftentimes only relatively few different states are necessary, or
if a more complex algorithm is required the control logic can be broken down hierar-
chically. Typically only a few bits are necessary to store the state, depending on the
encoding scheme used. Compared to the size of the data path in a typical embedded
cryptographic system, the size of the control logic is often relatively small, even when
a highly redundant encoding scheme is being used. As an example we refer to the
Montgomery multiplier introduced above, which has a total area of 10,410 equivalent
gates, excluding the area for the data storage. The control logic occupies an absolute
area of 2,015 equivalent gates, around 20%; the remaining 80% belongs to the data
path.
8.4. REDUNDANT STATE ENCODING 101
Electronic design automation tools often allow specification of a preferred state
encoding scheme for the automatic synthesis of finite state machines. The typical
styles to choose from are “one-hot”, “gray” or “binary” encoding, but they only
allow a trade-off between the speed and the area of the resulting circuit. We propose
to investigate a third trade-off alternative, i.e. the degree of fault tolerance or, put
differently, the resilience against fault attacks. Although fault tolerant sequential
circuits have been the subject of research in the context of reliable system design, there
are a couple of important aspects that require further investigation in the context of
cryptographic circuits, due to the fundamentally different adversarial fault model.
We place a special emphasis on the detection, rather than correction, of faults.
Our argument is quantitative in that the number of detectable faults (d− 1) is larger
than the number of correctable faults (b(d − 1)/2c), based on the minimum Ham-
ming distance d of the coding scheme. Since there is no way of telling whether the
multiplicity of the fault is strictly less than d/2 or not, there is a chance that the
error correction will produce a valid, but ultimately incorrect next state. On the
other hand we also argue qualitatively: due to the adversarial nature of faults in the
cryptographic context and their potentially devastating effects on the security of the
system, the detection of errors is far more important than producing a result under all
circumstances. Detected faults should rather be dealt with under a suitable exception
handling mechanism, that prevents side-channel leakage, e.g. by erasing sensitive key
material. As a side benefit, the area overhead for detection is typically much less
than that of correction and the circuit does not have to sit on the critical path.
One-hot encoding is preferred by digital logic designers whenever speed and a
simple next state logic are desired. Despite its large amount of redundancy, its mini-
mum distance properties are weak, since valid codewords only differ by two bits. In
the case of a double error which resets the active bit and sets another one there is
not enough information in the encoding to detect the error, and thus one-hot en-
102 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
coding is not suitable for our purposes. Since performance is less of a priority in
security-critical applications, we can trade off the speed of one-hot encoding for the
capability to detect errors. One way of doing so is by choosing a state encoding based
on error detecting codes (EDC) with large minimum distance. For example, if we
had a state machine with m = 15 states, one-hot encoding would require a 15-bit
state vector. Encoding the states with a (15,4)-Simplex code, which is the dual of a
(15,11)-Hamming code, would require a state vector of exactly the same size. One-hot
encoding only has a minimum distance of 2, while the Simplex code has a distance of
8, which allows detection of all errors with a Hamming weight of up to 7.
8.5 Fault-Resilient Sequential Circuits
A fault-resilient state machine (Figure 8.3) must necessarily incorporate redundancy—
not only in the state encoding, but also in the combinational logic of next state and
output functions. In a state machine protected by a linear (n, k) code, the redundant
state vector s(n) = (s0, s1, . . . , sn−1) now consists of n bits. In the following we will
also implicitly assume that the input signal i is given in redundant form as well. Its
next-state logic is a set ∆n(s, i) of n functions δj, that each determine one bit of s.
In general, since there are only 2k states (but encoded in redundant form), each δj
is a boolean function over k + ιk variables (ιk denotes the number of non-redundant
bits in the input vector i).
Formally speaking, the state space S(n) is a k-dimensional subspace of an n-
dimensional vector space {0, 1}n. The basis vectors of this subspace are the n-bit row
vectors of the generator matrixG of the code. Any value of the state vector s which is
not a linear combination of these basis vectors is therefore treated as an error. While
this interpretation nicely illustrates the mapping of k-bit states into redundant n-bit
form, it is not useful for the definition of next state functions, because we want to keep
8.5. FAULT-RESILIENT SEQUENTIAL CIRCUITS 103
the state only in redundant form. If we were to decode the current state, compute
the next state in only k bits and re-encode it to n bits, this would leave the system
with a single point of failure in the next state function rendering it vulnerable to
attacks. Rather, we keep the system state in redundant form at all times, to enhance
its resilience against attackers probing for the weakest spot of the system.
An alternative interpretation that is more suitable for our definition of a redundant
state machine is to view the columns of G as n redundant basis vectors of the non-
redundant k-dimensional state space S, which is isomorphic to S(n). Any k linearly
independent basis vectors form a complete basis for S and span the entire state space.
Thus, those k (out of n) state variables that are associated with the respective vectors
of such a basis can be used for specification of a next state or output function. In
the following we will let σj and ξj denote such sets of k state variables associated
with linearly independent basis vectors (columns of G), for the next state and output
functions, δj respectively λj. Additionally we introduce non-redundant subsets κj of
input variables (one per next state function). Each κj selects ιk input variables from
the redundant input vector i. If the columns of G were not linearly independent,
then this would introduce an ambiguity about the current state and certain functions
could not be implemented. It usually depends on the exact construction of the code
to determine which of all
(
n
k
)
possible sets of k state variables are suitable.
If enough many sets σj, κj, ξj are available, then the next state and output logic
functions can be defined over a variety of such sets. It is prudent to make use of such
alternative definitions, so as to minimize the hazardous effect that a single faulty
state variable has on the overall system.
104 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
.
 
.
 
.
i     I δ1
δ0 s     S∈
∈o     O
σ0
σ1 ξ 1
ξ 0
λ1
λ0
λω−1
∆
ι
n
n
.
 
.
 
.
St
at
e 
Re
g.
n
Λ
.
 
.
 
.
k
k
k
Alarm
Er
ro
r
D
et
ec
tio
n
(sy
nd
rom
e)
κ
.
 
.
 
.
k
k
k
ξω−1
κ
κ0
1
n−1
σn−1 n−1δ
∈
Figure 8.3: Fault-resilient State Machine
Example
An example will help to illustrate the last point: Let G be the generator matrix of a
systematic (7, 3) Simplex code (dual of the (7,4)-Hamming).
G =

1 0 0 1 1 0 1
0 1 0 1 0 1 1
0 0 1 0 1 1 1

The rows of G form the k-dimensional basis for the state space S as a subspace
of {0, 1}n. Let s(n) = s(k)G be the redundant state vector. Using the modular
exponentiation example from Section 8.2, the states of a redundant state machine
would be encoded as in Table 8.1. Alternatively we can interpret the columns of G as
redundant basis vectors ν0, ν1, . . . , ν6 spanning S, and associated with s0, s1, . . . , s6,
respectively.
In a non-redundant state machine the next-state logic functions would be specified
as δj(s
(k), i). In the redundant case, we still have the same number of states, but now
we have n = 7 state variables (s0, s1, . . . , s6) that we can use to specify the next-state
8.5. FAULT-RESILIENT SEQUENTIAL CIRCUITS 105
Table 8.1: Simplex State Encoding for the Modular Exponentiation Example
State s(k) = (s0, s1, s2) s
(n) = (s0, s1, s2, s3, s4, s5, s6)
Idle 001 0010111
Init 010 0101011
Load1 011 0111100
Load2 100 1001101
Multiply 101 1011010
Square 110 1100110
Result 111 1110001
functions. Due to systematic encoding we have the following relationships between
redundant (s0, s1, s2, s3, s4, s5, s6)
(n) and non-redundant state variables (s0, s1, s2)
(k):
s
(n)
0 = s
(k)
0
s
(n)
1 = s
(k)
1
s
(n)
2 = s
(k)
2
s
(n)
3 = s
(k)
0 ⊕ s(k)1
s
(n)
4 = s
(k)
0 ⊕ s(k)2
s
(n)
5 = s
(k)
1 ⊕ s(k)2
s
(n)
6 = s
(k)
0 ⊕ s(k)1 ⊕ s(k)2
Let for example σj = (s1, s3, s6). Since columns 1, 3 and 6 of G are linearly inde-
pendent, it is generally possible to specify an arbitrary next state or output function
over this set of state variables, e.g. δj(σ, i) = δj(s1, s3, s6, i0). This is always possible
and does not depend on what the actual application of the state machine is1. The
boolean function definition of δj, however, does depend on the application.
1Once a specific function is defined, it might not need the complete state information in k variables
and the logic implementation may be minimized.
106 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
Table 8.2: State Transition Table of the Fault-Resilient FSM
State s0 s1 s2 s3 s4 s5 s6 start cnt< 0 s′0 s′1 s′2 s′3 s′4 s′5 s′6
(illegal) 0 0 0 0 0 0 0 x x 0 0 0 0 0 0 0
idle 0 0 1 0 1 1 1 0 x 0 0 1 0 1 1 1
idle 0 0 1 0 1 1 1 1 x 0 1 0 1 0 1 1
init 0 1 0 1 0 1 1 x x 0 1 1 1 1 0 0
load1 0 1 1 1 1 0 0 x x 1 0 0 1 1 0 1
load2 1 0 0 1 1 0 1 x x 1 0 1 1 0 1 0
mult 1 0 1 1 0 1 0 x x 1 1 0 0 1 1 0
sqr 1 1 0 0 1 1 0 x 0 1 0 1 1 0 1 0
sqr 1 1 0 0 1 1 0 x 1 1 1 1 0 0 0 1
res 1 1 1 0 0 0 1 x x 0 0 1 0 1 1 1
Conversely, it should immediately become clear that we should not specify a
next-state function over the sets (s0, s1, s3) or (s0, s2, s4). The bases associated
with the selection of state variables are {ν0, ν1, ν3} = {(1, 0, 0), (0, 1, 0), (1, 1, 0)} and
{ν0, ν2, ν4} = {(1, 0, 0), (0, 0, 1), (1, 0, 1)}, both of which are not linearly independent:
ν0 + ν1 + ν3 = 0 and ν0 + ν2 + ν4 = 0 in GF(2). In both examples one coordinate
is always zero in all basis vectors. If a hypothetical next state function depends on
that coordinate to distinguish between two different states, the function cannot be
implemented.
We will now give a concrete example of a fault-resilient state machine imple-
mentation using the error detecting (7, 3) Simplex code. We will further compare our
approach to N -modular redundancy, a concept from traditional fault tolerant compu-
tation, with respect to area overhead and fault-resilience. In N -modular redundancy
functional modules are replicated N times, ideally employing design diversity tech-
niques. Their outputs are either voted on by a majority logic (i.e. for error correction)
or simply compared for consistency to detect any error that may occur in a module.
8.5. FAULT-RESILIENT SEQUENTIAL CIRCUITS 107
The most common configuration is triple modular redundancy (TMR), sometimes also
referred to as triads [HJSI75], which allows either the detection of two errors or the
correction of a single error. Since a fault-resilient system requires redundancy not
only in the state machine but overall, redundant versions of the input and output
signals need to be present as well. In a system with non-redundant inputs, for exam-
ple, the attack described in Section 8.2 could be carried out by forcing the condition
count < 0, which may be indicated to the state machine by a single status bit. Such
an attack is harder to accomplish if input signals are available as multiple redundant
and independent variables. The amount of redundancy should be chosen to protect
against the same number of errors t as the state encoding scheme, to ensure a uni-
form level of resilience for the entire system. The level of resilience is measured by the
minimum number t of signals an attacker needs to change (bit-flips) for a successful
attack. For state encoding with linear codes over GF(2) we have t = d−1, where d is
the minimum distance (or non-zero Hamming weight) of all code words. In a modular
redundant scheme the overall resilience level can be found by concatenating the state
vectors of the individual modules and computing its total minimum distance D and
resilience t = D − 1. They are determined by the degree of modular redundancy N
and the distances dj of each module’s individual state encoding:
D =
N∑
j=1
dj
In the following we will therefore require a D-fold redundant set of input signals to be
available to the circuit, which in turn must provide an D-fold redundant set of output
functions. This provides coverage for the corner case that all t faults occur on the
inputs. For example, in the case of a TMR system with simple binary encoded states
the minimum distance between words of the concatenated state vectors is 3 (D = N)
with a worst case resilience of t = 2. Our EDC-based state machine implementation
is a special case with N = 1, d = 4 and therefore requires an input set of distance 4
108 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
to maintain the same amount of resilience against D − 1 = 3 faults.
We will now provide an example using the Montgomery Ladder algorithm intro-
duced earlier in section 8.2. A very simple state machine implementation requires
two input signals, start and count!=0, and the D-fold redundant input vector
thus consists of the signals i = (i0,0, . . . , i0,D−1, i1,1, . . . , i1,D−1), with the mapping
i0,j = start and i1,j = count!=0. It computes three output signals for control-
ling the data path (multiplier): mul op a, mul op b and mul res. Again, these sig-
nals will be generated with D-fold redundancy and mapped to the output vector
o = (o0,0, o0,1, o0,2, . . . , o2,D−1).
As with the redundant state vector, we will not use all elements of i for definition
of the next state functions, but only subsets κj, each containing two input variables.
Finally, we need ω sets ξj of k state variables each as inputs to the output functions
λj, again with minimal overlap. For our example we chose:
σ0 = (s0, s2, s3)
σ1 = (s1, s3, s4)
σ2 = (s2, s4, s5)
σ3 = (s3, s5, s6)
σ4 = (s0, s4, s6)
σ5 = (s0, s1, s5)
σ6 = (s1, s2, s6)
,
κ0 = (i0,0, i1,0)
κ1 = (i0,1, i1,1)
κ2 = (i0,3, i1,2)
κ3 = (i0,0, i1,3)
κ4 = (i0,1, i1,0)
κ5 = (i0,2, i1,2)
κ6 = (i0,3, i1,3)
,
ξ0 = (s0, s1, s2)
ξ1 = (s0, s3, s5)
ξ2 = (s3, s4, s6)
ξ3 = (s2, s3, s4)
ξ4 = (s1, s2, s4)
ξ5 = (s3, s5, s6)
ξ6 = (s0, s4, s5)
ξ7 = (s2, s5, s6)
ξ8 = (s1, s2, s6)
ξ9 = (s0, s3, s4)
ξ10 = (s1, s5, s6)
ξ11 = (s0, s1, s6)
From the state transition table (Table 8.2) we can now derive the concrete next state
8.5. FAULT-RESILIENT SEQUENTIAL CIRCUITS 109
functions based on the state and input variable sets σj and κj:
δ0(σ0, κ0) = s0s
′
2 + s2s3
δ1(σ1, κ1) = i0,1s1s
′
3s4 + i0,1s
′
3s4i1,1 + s
′
1s
′
3s4i1,1 + s3s
′
4
δ2(σ2, κ2) = s2s
′
4s
′
5 + s
′
2s4 + s
′
2s5 + s4s5i
′
1,2
δ3(σ3, κ3) = s
′
3s5s
′
6i
′
0,0 + s3s
′
5 + s3s6 + s5s6i1,3
δ4(σ4, κ4) = s0s
′
4 + s
′
0s4s
′
6 + s
′
0s4i
′
1,0 + s
′
4s6
δ5(σ5, κ5) = i
′
0,2s0 + s0s
′
5 + s
′
1s5
δ6(σ6, κ6) = i0,3s1s
′
6 + s1s2 + s2s6
We can derive the output logic equations in a similar fashion (omitted here due to
space considerations).
8.5.1 Synthesis Results and Overhead Analysis
Due to the principal difference in error detection capability between an N-modular re-
dundant and an EDC-based state machine, we decided to compare the non-redundant
implementation to four different redundant implementations with slightly varying pa-
rameters:
1. a 2-fault resilient triple modular redundant (TMR, N = 3) version
2. a 3-fault resilient quadruple modular redundant (QMR, N = 4) version
3. an internally 3-fault resilient EDC version with a 3-fold redundant I/O set (3-
EDC)2.
4. a 3-fault resilient EDC version with a 4-fold redundant I/O set (4-EDC).
Our initial analysis consisted of using the UC Berkeley SIS logic synthesis tool and
the MCNC synthesis script for mapping the five different circuit descriptions to the
2This effectively reduces the worst case fault resilience to 2 faults on the input set.
110 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
MCNC standard libraries mcnc.genlib and mcnc latch.genlib and comparing the
results (Table 8.3).
8.5.1.1 Area Overhead
Table 8.3: Initial Analysis of Various EDC and NMR Schemes
Non-red 3-EDC 4-EDC TMR QMR
# Inputs 2 6 8 6 8
# Outputs 3 10 13 10 13
# SOP-Literals 42 155 155 135 179
Gate Count 19 56 60 59 77
# Latches 3 8 8 10 13
Total Area 64 207 210 210 276
Area Overhead (%) 0 223 228 228 331
Despite the higher literal count of 3- and 4-EDC compared to the TMR imple-
mentation, the actual circuit area is almost exactly the same. The lower resilience
of TMR against faults, however, makes a compelling argument for building fault re-
silient state machines based on error detecting codes, especially since the degree of
resilience will be even better for larger state space dimension k. A quadruple modu-
lar redundant design requires more than 100% more overhead for achieving the same
level of resilience. In addition, while N -modular redundancy schemes have a constant
overhead of approximately (N−1) ·100% independent of the state space dimension k,
the (storage) overhead of EDC based fault resilient state machines actually decreases
with a larger state space (cf. Fig. 8.4). It appears reasonable to expect a similar
trend for the complexity of the next state logic.
These initial results encouraged us to proceed with the analysis of a more detailed
state machine model synthesized using Synopsys DesignCompiler and a 0.13µm ASIC
8.5. FAULT-RESILIENT SEQUENTIAL CIRCUITS 111
standard cell library. Special care was taken to direct the synthesis not to share logic
between the next state functions δn. This is to avoid the manifestation of a single
stuck-at fault as an error on multiple outputs of the next state function ∆n(s, i). The
results3 are given in Table 8.4. It shows some clear differences in area requirements
Table 8.4: Analysis of Detailed FSM Models (∗10-state FSM with k = 4)
Non-red 3-EDC 4-EDC TMR QMR non-red.∗ EDC
Total Area 48 137 143 189 249 71 361
Area Overhead 0.0% 188.1% 199.3% 295.8% 422.4% 0.0% 413.3%
Resilience t 0 2 3 2 3 0 7
Attack Effort 100% 300% 400% 300% 400% 100% 800%
Effectiveness η 2.098 2.184 2.804 1.590 1.606 1.422 2.216
Norm. Effect. H 1.000 1.041 1.336 0.758 0.766 1.000 1.559
between the codes based and the TMR implementation, which are not visible from
the initial analysis. To some extent this is due to the more accurate modeling (we only
used a simplified state machine design for the SIS flow), but the main reason is the
fact that we used an industry-strength standard cell library with advanced complex
gates of varying drive strength. The size of these gates is often more compact than
a realization from simple gates, and given in precise decimal fraction area numbers.
The size of gates in the MCNC library on the other hand was only given in integers.
8.5.1.2 Effectiveness and Scalability
Table 8.4 also gives the performance of our examples in terms of the effectiveness met-
ric for fault resilience measures that we defined in Section 8.3, Definition 8.3.2. The
value η describes the relative effort per gate, while H is the normalized effectiveness
value relative to the effectiveness of the unprotected circuit. The results are quite
3The area is given in terms of equivalent gate size, i.e. the size relative to a 2-input NAND gate.
112 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
interesting: As expected, the EDC based fault resilient circuits show the best effec-
tiveness in preventing fault attacks, however, the N-modular redundancy approach
fares much worse than having no countermeasure at all. We also experimented with
a slightly different implementation with 10 states and a much larger resilience t = 7.
Here we used a (15,4)-Simplex code for state encoding. Our results indicate that at
slightly more than 400% area overhead the EDC based scheme scales much better
than N-modular redundancy, for which we expect an overhead of well beyond 800%
at the same resilience level.
8.5.1.3 Critical Path Overhead
We could not determine any substantial critical path overhead with any of our ex-
amples. All designs synthesized at 1GHz clock frequency without problems, and we
would expect the critical path to be on the data path much rather than the control
logic. In the case of 3-EDC and 4-EDC the redundant state variables are computed
in parallel, completely independent of each other, and therefore do not add to the
critical path. Furthermore, the error detection network is not in the critical path
either, since it operates in parallel with the next state and output logic. We conclude
that there is no performance overhead associated with these fault resilience measures.
8.5.2 Influence of Control Logic Overhead on Total Circuit
Area
Following the overhead analysis of the exponentiation state machine we want to ex-
trapolate the total overhead of implementing fault resilient control logic throughout
the entire modular exponentiation circuit. We start from the break-down of area re-
quirements for the Montgomery multiplier described in Section 8.2. Table 8.5 shows
the components of the multiplier sorted by type and their size in equivalent gates.
For extrapolation we factored the overhead estimates from Table 8.4 into the area
8.6. SELECTION OF CODES 113
Table 8.5: Data Path / Control Logic Area Comparison of Montgomery Multiplier
Component (Type) Area (equiv. gates) Percentage
Pipeline Stages (Data Path) 6,756 64.90%
Delay Registers (Data Path) 1,640 15.75%
Inner Loop FSM (Control Logic) 1,079 10.36%
Outer Loop FSM (Control Logic) 848 8.15%
FIFO control (Control Logic) 87 0.84%
requirement for control logic, but not for the data path. Due to the fact that control
logic occupies less than 20% of the total area in a non-redundant implementation, the
overhead due to fault resiliency techniques is limited to around 40% for our method,
whereas for quadruple modular redundancy it exceeds 80% (Table 8.6).
Table 8.6: Total Overhead for Modular Exponentiation Circuit with FT Control
Non-red 3-EDC 4-EDC TMR QMR
Data Path 8,396 8,396 8,396 8,396 8,396
Control Logic 2,062 5,942 6,173 8,163 10,773
Total Area 10,458 14,338 14,569 16,559 19,169
Overhead 0.00% 37.09% 39.30% 58.33% 83.29%
8.6 Selection of Codes for t-Fault Resilient State
Machines
We have seen in the previous example how we can design a fault-resilient state machine
with up to 8 states using an ordinary (7,3) Simplex code. Since this code has a
minimum distance of four, it is able to detect up to three faults in the state vector
114 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
s. In general, the number of errors that can be detected by means of a linear block
code with minimum distance d is t = d − 1. The selection of a particular code for a
fault-resilient state machine implementation therefore depends on the desired level of
resilience, i.e. the number of detectable errors t, as well as the dimension of the state
space k. These two parameters determine the length n of the block code, as shown in
Table 8.7. We obtained our numbers from the reference table of minimum-distance
bounds for binary linear codes [HS73]. We restructured the table in order to display
the minimum length n for a given dimension k and desired fault-resilience t.
Table 8.7: Minimum Code Lengths n for State Space of Dimension k and Fault
Resilience t
t \ k 1 2 3 4 5 6 7
1 2 s3 4 5 6 7 8
2 3 5 6 7 9 10 11
3 4 6 s7 8 10 11 12
4 5 8 10 11 13 14 15
5 6 9 11 12 14 15 16
6 7 11 13 14 15 17 18
7 8 12 14 s15 16 18 19
The first column of the table consists entirely of trivial repetition codes, because
of k = 1. The first row, on the other hand, consists of all parity codes with no
repetitions of column vectors, but with only a minimum distance of 2. These two are
the extremes between which we can trade off the fault resilience versus the code length
(or rather compactness). This is visualized in Figure 8.4, where EDC with different
levels of fault resilience are juxtaposed with one-hot and binary encoding in a double
log-scale graph. For a very small number of states adding resilience against multiple
faults incurs some overhead penalty, even when compared to one-hot encoding. But
8.6. SELECTION OF CODES 115
2 5 10 30 100
1
2
5
10
20
50
100
No. States
Co
de
 L
en
gt
h
binary (t=0)
one−hot (t=1)
EDC (t=1)
EDC (t=3)
EDC (t=5)
EDC (t=7)
Figure 8.4: Lengths of Various t-Resilient Error Detecting Codes
as the state space grows, the complexity of one-hot encoding grows much more rapid
with each additional state and eventually becomes infeasible. On the other hand,
the relative overhead of error detecting codes over simple binary encoding decreases
monotonically.
Duplication of columns in G is an easy and efficient way to increase the minimum
distance—and therefore the resilience—of an EDC. This is helpful against adversar-
ial single-mode failures such as light attacks. In fact, whenever the code length n
exceeds 2k, G inevitably contains some duplicated column vectors. Take, for exam-
ple, the (10, 3) code of resilience four: With k = 3 there can only be seven unique
non-zero column vectors in G; hence there must be three duplicates. It turns out
that Simplex codes achieve maximum fault resilience possible without duplication of
columns. Their generator matrix consists of all non-zero column vectors of dimension
116 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
k. They are marked in Table 8.7 with the small letter s. If G contains duplicate
columns, this means that two or more state variables associated with those columns
will have the same value (under fault-free conditions), and could thus be computed
by identical next state functions. The inherent problem of identical functions is their
susceptibility to common-mode failure as defined earlier in Section 4. If the cause of
a fault is not locally limited, it is likely to have the same effect on identical circuit
implementations of a boolean function, and therefore increase the fault multiplicity.
8.6.1 Using Design Diversity to Counter CMF
We can counter the effects of CMF using a technique called design diversity [MM00].
By implementing the next state function in several different, but ultimately equivalent
ways, the likelihood of combined failure under the same circumstances is reduced.
Design diversity is naturally promoted by defining functions over alternative sets of
k state variables, as described in Section 8.5. It is, however, important to not choose
the σj na¨ıvely. For example, if we were to define the next state functions δj over only
those sets σj that all include the state variable s0, then we would have biased the
circuit. A fault in s0 might spread out to cause subsequent errors in all other state
variables. An active adversary could systematically test for such a bias and exploit
it. We must therefore balance the use of each state variable across all σj. In order to
reduce the influence of a single bit error in any of the state variables, the subsets σj
should uniformly cover all state variables, with minimum overlap. In a k-dimensional
state space the minimum cover implies that each state variable be used a maximum
of k times. The goal is therefore to find n sets σj chosen uniformly from s(n) such
that each state variable occurs only k times. Oftentimes a minimum cover can be
found relatively easy, simply by inspection. Automating the process should be fairly
straightforward, but lies outside of the scope of this work. Returning to our previous
8.6. SELECTION OF CODES 117
example with the (7,3) Simplex code we could choose the subsets as follows:
σ0
σ1
σ2
σ3
σ4
σ5
σ6

=

(s0, s2, s3)
(s1, s3, s4)
(s2, s4, s5)
(s3, s5, s6)
(s0, s4, s6)
(s0, s1, s5)
(s1, s2, s6)

One can readily verify that the basis vectors associated with the sets σj are all linearly
independent and no state variable gets used more than k times. In other words, there
is no single state variable that stands out over others as a potential target for an
attack.
8.6.2 Attack Resilience
Finally we would like to show intuitively how the fault resilient circuit holds up
in different attack scenarios. We start with the most likely point of attack for an
adversary with single-mode fault insertion capability, the state register, and continue
from there towards the boundaries of the sequential circuit.
State register. A transient fault can completely change the register contents past
the duration of the fault, since the flip-flops are made up from bistable devices.
Here the structural design of the fault-resilient state machine comes into play:
Since the error detection network is connected directly to the output of the state
register, all faults up to multiplicity t will be detected. Note that this would not
be the case if the detector were to directly follow the next state function. From
our choices of σj one can see that each state variable has an influence on up to k
variables of the next state and the attacker would only require a fraction dt/ke of
118 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
faults to influence more than t state variables. With error detection right after
the state register this threat is eliminated. Furthermore, the detection network
does not add to the critical path, thereby avoiding a performance penalty.
Feedback path. Any fault on the feedback path from the state register to the inputs
of the next state function would be detected by the error detection circuit as
well.
Next state logic. Fault multiplicity stemming from shared logic between individual
next state functions is not likely, due to the minimum cover policy of selecting
the sets σj, i.e. there should not be any pair of state variables that is a member
of more than one σj. But even if that could happen, it would be easy to simply
disallow sharing of logic gates between individual functions during synthesis.
I/O set. The input and output vectors must also be considered as a potential point
of failure, as mentioned earlier. It is not clear at this point, if the error detec-
tion mechanism of the sequential circuit alone can guarantee t-fault resilience.
It might be necessary to have additional error detection outside of the state
machine, checking for faults on the I/O set, but this is outside of the scope
of this work. The minimum requirement is that the I/O set should contain a
sufficiently high amount of redundancy to enable t-fault resilience, as described
in the example in the previous section. One potential solution is to use weight-
based codes as described in [DT99].
As we can see from this intuitive analysis, the architecture of our sequential circuit
can withstand fault insertion of degree t, which was our design goal.
8.7. DISCUSSION 119
8.7 Discussion
In this chapter we addressed the importance of not only protecting the data path of
cryptographic circuits against fault attacks, but also the control logic itself. Despite
the abundance of fault tolerance techniques for sequential circuits in other application
domains, those are based on a fundamentally different fault model and therefore
not suitable for the task. We strongly believe that these techniques have to be re-
evaluated and adapted to the new threat scenario that presents itself in the form
of cryptographic fault analysis. A determined attacker can probe a system for its
weaknesses and enforce a worst case scenario. For protection against an adversary
with the capability to induce t faults simultaneously, the circuit must be resilient up
to that degree. Most of the ideas proposed in the literature, however, are concerned
with efficiency and the ability to detect two and correct at most a single error. This
work is a first step towards fault resilient sequential circuits under an adversarial
fault model. We gave a motivating example of a hypothetical attack on a modular
exponentiation based crypto system that would succumb to this kind of attack, in
spite of a protected data path and other algorithmic countermeasures. While it is
impossible to ever completely protect against a determined adversary, we presented
some general principles of how to improve the resilience against an attack, to make
it infeasible beyond a certain level of effort.
We showed that the overhead incurred through the use of error detecting codes
is less than that of N-modular redundant implementations, while providing better
fault resilience properties. In addition to this we demonstrated that the percentage
of control logic in a typical cryptographic application is only a small fraction of the
data path. Thus, even a high amount of overhead due to fault resilience measures
does not significantly influence the complexity of the overall system. It is, however
important to understand the mechanisms of fault propagation to properly implement
fault resilience. The rule for minimal overlap between state variables in the sets σn
120 CHAPTER 8. TAMPER RESILIENT CONTROL STRUCTURES
is such an example. Furthermore, we defined a new metric which captures the ef-
fectiveness of a fault detection method in terms of the minimum effort the attacker
has to make to mount a successful attack with respect to the area overhead spent in
implementing the error detection scheme. Our analysis and implementation results
have shown that the proposed coding based techniques provide far superior perfor-
mance when compared against N-modular redundant error detection techniques. For
proof of concept we implementated a Montgomery muliplier whose control logic was
realized using various error detection techniques. Our implementation results show
that the control logic occupies less than 20% of the total area in a non-redundant im-
plementation, and the overhead due to fault resiliency techniques is limited to around
40% for our method, whereas for a comparable quadruple modular redundancy it ex-
ceeds 80%. Furthermore, the effectiveness metric gives a high value of 2.8 for the
proposed technique whereas for QMR it is a mere 1.6. At around 75% effectiveness
after normalization it is clear that N-modular redundancy performs even less effective
than the unprotected circuit. For the same implementation, we found that the critical
path delay was not affected by the EDC. We provided rationale to claim that this
result would hold for even higher degrees of resiliency.
Further research is certainly required in this area. It is our hope that this first
step will encourage others to further contribute to this important aspect of embedded
cryptography. One open question, for example, is whether other types of codes, e.g.
Reed-Solomon etc., would provide for higher effectiveness. For the time being we only
investigated linear block codes due to the simpler error detection mechanism.
Chapter 9
Conclusion and Outlook
This dissertation we addressed the question of how to protect hardware implementa-
tions of public key cryptography against tampering attacks of a skilled adversary.
We presented three different ways to protect the finite field arithmetic of pub-
lic key cryptosystems. Each technique has its unique applications, advantages and
disadvantages, and their uses should be carefully evaluated on an individual basis.
Additionally, we realized that it is insufficient to only protect the data path of a
cryptographic hardware implementation. An attack on the control logic of the circuit
may completely reveal the secret key without touching the data path. Consequently
we developed design techniques for tampering resilient control structures, in our case
finite state machines.
In the following we discuss each of these contributions and provide an outlook on
future research opportunities.
Homomorphic Embedding
The homomorphic embedding of a finite field into a redundant ring provides the
opportunity to detect errors that may be the result of tampering. It has many ben-
efits that make it a good choice for performance critical applications with moderate
121
122 CHAPTER 9. CONCLUSION AND OUTLOOK
security needs. The overhead incurred by the larger operands and the additional op-
erations is minor and can be determined by the designer. Additionally the low-weight
form of the scaled modulus enables efficient modular reduction that can compete di-
rectly with other low-weight field moduli, for example those recommended by NIST
for the elliptic curve digital signature algorithm [oSN00]. The error checking pro-
cedure consists of a simple modular reduction that can be executed outside of the
critical path and thus does not affect the speed of the operation. The technique is in
principle applicable to fields of arbitrary characteristic, but since binary and prime
characteristic are the most commonly used field types, we only considered these here.
One of the limitations of homomorphic embedding is certainly the fact that not any
arbitrary field may be embedded, and not with an arbitrary amount of redundancy,
either. That said, a large choice of parameters exist that enable the designer to make
fine-grained trade-offs between the security level and the amount of redundancy—we
included a selection of suitable parameters in the appendix.
Future Research An interesting question regarding the generalization of cyclic
codes would certainly be if we can make any statement about their worst case error
detection capabilities, i.e. to provide a bound on their minimum distance properties.
Secondly, in our research we have focused mainly on error detection, while we did not
cover error correction (apart from an algorithm based fault tolerance perspective).
Nonlinear Arithmetic Codes
If we take a pessimistic point of view in terms of adversarial capabilities, linear codes
can never provide for adequate protection against errors that are induced precisely
and with a high degree of sophistication. In such a case it is absolutely necessary to
apply a non-linear scheme such as the one that we presented in chapter 6.
Nonlinear digit-based coding schemes can be problematic when it comes to design-
123
ing a predictor for a linear arithmetic operation such as addition or multiplication.
Particularly in the context of multi-precision arithmetic, carry propagation issues
need to be considered. In our case we were able to work around this issue by employ-
ing a check-symbol recomputation strategy, which ensures the integrity of operands
and their check-symbols at all times.
The powerful error detection capability that these codes possess comes at a relative
high cost in terms of check-symbol predictor and error detection network, and thus
may be only of interest to highly security-sensitive applications.
Future Research Our construction of nonlinear arithmetic codes can be applied
to protect any digit based integer arithmetic scheme, for example, an RSA modular
exponentiation in an integer ring. It does not currently apply to polynomial arithmetic
that would be of interest in the implementation of an elliptic curve cryptosystem over
binary extension fields. Another open question is if there exist other constructions
for the check-symbol prediction and error detection networks with a lower overhead.
Low-Cost Codes
Finally we presented a set of low-cost tamper-proof arithmetic primitives for opti-
mal extension fields. These are based on the original construction of robust codes
by Karpovsky and Taubin [KT04] and represent a compromise between robustness
and overhead for error detection. Optimal extension fields have been proposed in the
context of fast field arithmetic on low-cost embedded processors. The techniques pre-
sented here may be applied to both hard- and software implementations. Our analysis
of an elliptic curve scalar point multiplication shows that the overhead incurred is
minimal.
Future Research Recently, Baktır [BS06] and Saldamlı [Sal05] introduced spec-
tral techniques for fast multiplication in finite fields. Operands are converted into a
124 CHAPTER 9. CONCLUSION AND OUTLOOK
spectral representation by means of a number theoretic transform. Since after the
transformation the coefficients of the spectral representation of the operands are all
elements in GF(q), it appears to be possible to use robust codes to protect the spectral
arithmetic against adversarial faults.
Tamper Resilient Control Structures
Motivated by a hypothetical attack on a modular exponentiation based public key
accelerator, we explored the use of large minimum distance error detecting codes
for state machine encoding, along with other design considerations. We introduced
a novel design-metric and quantitively compared our approach to traditional N -
modular redundancy, with the result that our approach is more effective and scales
better. Following initial concerns of a reviewer about the overhead of our technique,
we also analyzed the ratio of control logic to data path in a typical cryptographic
circuit. We concluded that this ratio is typically very small and thus overhead in
control logic does not have a huge impact on the overall cost of the system, while
adding a valueable extra security margin.
Future Research Our work represents an initial step in the direction of fault
resilient control structures in this application domain. So far we only considered
finite state machines, but there are other control structures that present challenges of
their own, such as pipeline control circuits for security processors. Another direction
for future research may be to look into more suitable families of codes and potentially
online error recovery techniques.
125
General Open Issues
Not all questions pertaining to the issue of adversarial faults in cryptographic systems
could be addressed in this dissertation. In this section we would like to mention a
couple of research objectives that are of general relevance in this context.
A major issue raised by our work is the question of accuracy of our error and
attacker models. In general, this is a difficult question to answer, since it depends
on a large number of factors. The literature on this subject is quite sparse, so that
we had to rely on a various assumptions about the fault induction capabilities of a
determined individual and how these faults may manifest themselves in the circuit.
Since we do not possess the infrastructure for experimentally verifying these as-
sumptions, we had to extrapolate from only a few reports available, e.g. [SA02]. That
being said, we provide one important argument that makes our assumptions more rea-
sonable: in contrast to previous work on fault tolerant cryptographic circuits, which
only assume random faults, we model a determined adversary with strong capabilities.
Our rationale here is that it is generally better to err on the safe side. Nonetheless,
more work on a realistic model (or better yet: several realistic models for a variety of
attacker profiles) is urgently needed. A good first step in this direction was provided
by Lemke-Rust and Paar in [LRP06].
Another open question that our work raises is the issue of fault simulation. So
far, we determined the failure modes of our approaches analytically under either
the logical or the arithmetic error model. We did not, however, verify the error
models experimentally, e.g. by simulating random faults in a computer model of our
circuit. Such an experiment could give valuable insights into the relative performance
benefits of our linear and non-linear error detection schemes under a weaker, i.e.
random fault capable, attacker model. Unfortunately, fault simulations require a
certain infrastructure that was not available to us at the time, so we had to leave this
issue open.
126 CHAPTER 9. CONCLUSION AND OUTLOOK
Bibliography
[ABF+02] C. Aumu¨ller, P. Bier, W. Fischer, P. Hofreiter, and J.-P. Seifert. Fault
attacks on RSA with CRT: Concrete results and practical countermea-
sures. In B.S. Kaliski Jr., C¸.K. Koc¸, and C. Paar, editors, Proceedings of
4th International Workshop on Cryptographic Hardware and Embedded
Systems (CHES), number 2523 in Lecture Notes in Computer Science
LNCS, pages 260–275, Heidelberg, 2002. Springer-Verlag.
[AK96] R. Anderson and M. Kuhn. Tamper resistance - a cautionary note. In
Proceedings of the Second Usenix Workshop on Electronic Commerce,
pages 1–11. USENIX Assoc., USENIX Press, Nov 1996.
[Avi67] A. Avizienis. Design of fault-tolerant computers. In Proc. 1967 AFIPS
Fall Joint Computer Conf., volume 31 of AFIPS Conf. Proc., pages 733–
743, 1967.
[BBK+02] G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, and V. Piuri. On the
propagation of faults and their detection in a hardware implementation
of the advanced encryption standard. In M. Schulte, S. Bhattacharyya,
N. Burgess, and R. Schreiber, editors, Proceedings of the IEEE Inter-
national Conference on Application-Specific Systems, Architectures, and
Processors (ASAP), pages 303–314, San Jose, CA, USA, July 2002. IEEE
Computer Society Press.
127
[BDL97] D. Boneh, R. A. DeMillo, and R. J. Lipton. On the importance of check-
ing cryptographic protocols for faults. In W. Fumy, editor, Advances in
Cryptology - EuroCrypt’97, volume 1233 of Lecture Notes in Computer
Science, pages 37–51, Heidelberg, 1997. Springer. Proceedings.
[Ber04] M. Berg. Fault tolerant design techniques for asynchronous single event
upsets within synchronous finite state machine architectures. In 7th Inter-
national Military and Aerospace Programmable Logic Devices (MAPLD)
Conference. NASA, Sep 2004.
[BHLM01] M. Brown, D. Hankerson, J. Lo´pez, and A. Menezes. Software implemen-
tation of the NIST elliptic curves over prime fields. In D. Naccache, editor,
Topics in Cryptology — CT-RSA 2001, volume 2020 of Lecture Notes in
Computer Science, pages 250–265, Heidelberg, April 2001. Springer.
[BOS03] J. Blo¨mer, M. Otto, and J.-P. Seifert. A new CRT-RSA algorithm secure
against bellcore attacks. In CCS ’03: Proceedings of the 10th ACM con-
ference on Computer and communications security, pages 311–320, New
York, NY, USA, 2003. ACM Press.
[BOS06] J. Blo¨mer, M. Otto, and J.-P. Seifert. Sign change fault attacks on elliptic
curve cryptosystems. In L. Breveglieri, I. Koren, D. Naccache, and J.-
P. Seifert, editors, Fault Diagnosis and Tolerance in Cryptography 2006
(FDTC ’06), volume 4236 of Lecture Notes in Computer Science, pages
36–52, Heidelberg, Oct 2006. Springer.
[BP98] D. V. Bailey and C. Paar. Optimal extension fields for fast arithmetic in
public-key algorithms. In H. Krawczyk, editor, Advances in Cryptology -
CRYPTO 98, volume 1462 of Lecture Notes in Computer Science, pages
472–485, Heidelberg, August 1998. Springer.
128
[BP01] D. V. Bailey and C. Paar. Efficient arithmetic in finite field extensions
with application in elliptic curve cryptography. Journal of Cryptology,
14(3):153–176, 2001.
[BS97] E. Biham and A. Shamir. Differential fault analysis of secret key
cryptosystems. In B.S. Kaliski Jr., editor, Advances in Cryptology -
CRYPTO’97, number 1294 in Lecture Notes in Computer Science, pages
513–525, Heidelberg, 1997. Springer-Verlag.
[BS06] S. Baktır and B. Sunar. Finite field polynomial multiplication in the
frequency domain with application to elliptic curve cryptography. In 21st
International Symposium on Computer and Information Sciences (ISCIS
2006), volume 4263 of Lecture Notes in Computer Science LNCS, pages
991–1001, Heidelberg, Oct 2006. Springer-Verlag.
[CT91] M. Chen and E. A. Trachtenberg. Permutation codes for the state assign-
ment of fault tolerant sequential machines. In Proc. 10th Digital Avionics
Systems Conference, pages 85–89. IEEE/AIAA, Oct 1991.
[CT05] H. Choukri and M. Tunstall. Round reduction using faults. 2nd Inter-
national Workshop on Fault Diagnosis and Tolerance in Cryptography
(FDTC’05), Edinburgh, Scotland, September 2005.
[DT99] N. Das and N. A. Touba. Weight-based codes and their application to
concurrent error detection of multilevel circuits. In VTS’99, 1999.
[FV06] G. Fumaroli and D. Vigilant. Blinded fault resistant exponentiation.
In L. Breveglieri, I. Koren, D. Naccache, and J.-P. Seifert, editors, Fault
Diagnosis and Tolerance in Cryptography 2006 (FDTC ’06), volume 4236
of Lecture Notes in Computer Science, pages 62–70, Heidelberg, Oct 2006.
Springer.
129
[Gau02] G. Gaubatz. Versatile montgomery multiplier architectures. Master’s
thesis, Worcester Polytechnic Institute, Worcester, Massachusetts, May
2002.
[GHS02] P. Gaudry, F. Hess, and N. P. Smart. Constructive and destructive facets
of weil descent on elliptic curves. Journal of Cryptology, 15(1):19–46,
2002.
[GMO01] K. Gandolfi, C. Mourtel, and F. Olivier. Electromagnetic analysis: Con-
crete results. In C¸. K. Koc¸, D. Naccache, and C. Paar, editors, Cryp-
tographic Hardware and Embedded Systems–CHES 2001, volume 2162 of
Lecture Notes in Computer Science LNCS, pages 251–272, Heidelberg,
May 2001. Springer.
[Gor87] D. M. Gordon. Perfect multiple error-correcting arithmetic codes. Math-
ematics of Computation, 49(180):621–633, Oct 1987.
[GP02] J. Guajardo and C. Paar. Itoh-tsujii inversion in standard basis and
its application in cryptography. Design, Codes, and Cryptography,
25(2):207–216, Feb 2002.
[GS04] J. Großscha¨del and E. Savas¸. Instruction set extensions for fast arithmetic
in finite fields GF(p) and GF(2m). In M. Joye and J.-J. Quisquater, ed-
itors, Cryptographic Hardware and Embedded Systems–CHES 2004, vol-
ume 3156 of Lecture Notes in Computer Science, pages 133–147, Heidel-
berg, Aug 2004. Springer.
[GS05] G. Gaubatz and B. Sunar. Robust finite field arithmetic for fault-tolerant
public-key cryptography. In L. Breveglieri and I. Koren, editors, 2nd
Workshop on Fault Diagnosis and Tolerance in Cryptography - FDTC
2005, September 2005.
130
[GSK06] G. Gaubatz, B. Sunar, and M. G. Karpovsky. Non-linear residue codes
for robust public-key arithmetic. In L. Breveglieri, I. Koren, D. Naccache,
and J.-P. Seifert, editors, Fault Diagnosis and Tolerance in Cryptography
2006 (FDTC ’06), volume 4236 of Lecture Notes in Computer Science,
pages 173–184, Heidelberg, Oct 2006. Springer.
[GSS07] G. Gaubatz, E. Savas¸, and B. Sunar. Sequential circuit design for em-
bedded cryptographic applications resilient to adversarial faults. IEEE
Transactions on Computers, 2007.
[Har04] L. Hars. Long modular multiplication for cryptographic applications.
Cryptology ePrint Archive, Report 2004/198, 2004.
[HJSI75] A. L. Hopkins Jr. and T. B. Smith III. The architectural elements of a
symmetric fault-tolerant multiprocessor. IEEE Transactions on Comput-
ers, C-24:498–505, May 1975.
[HMV04] D. Hankerson, A. Menezes, and S. Vanstone. Guide to Elliptic Curve
Cryptography. Springer, Heidelberg, first edition, 2004.
[HS73] H. J. Helgert and R. D. Stinaff. Minimum-distance bounds for binary
linear codes. IEEE Transactions on Information Theory, 19(3):344–356,
May 1973.
[IT88] T. Itoh and S. Tsujii. A fast algorithm for computing multiplicative
inverses in GF(2m) using normal bases. Information and Computation,
78:171–177, 1988.
[JLQ99] M. Joye, A.K. Lenstra, and J.-J. Quisquater. Chinese remaindering based
cryptosystem in the presence of faults. Journal of Cryptology, 4(12):241–
245, 1999.
131
[JWK04] N. Joshi, K. Wu, and R. Karri. Concurrent error detection schemes for in-
volution ciphers. In M. Joye and J.-J. Quisquater, editors, Cryptographic
Hardware and Embedded Systems - CHES 2004, volume 3156 of Lecture
Notes in Computer Science LNCS, pages 400–412, Heidelberg, Aug 2004.
Springer.
[JY02] M. Joye and S. M. Yen. The montgomery powering ladder. In B.S.
Kaliski Jr., C¸.K. Koc¸, and C. Paar, editors, Cryptographic Hardware and
Embedded Systems–CHES 2002, volume 2523 of Lecture Notes in Com-
puter Science, pages 291–302. Springer-Verlag, 2002.
[KAK96] C¸.K. Koc¸, T. Acar, and B.S. Jr. Kaliski. Analyzing and comparing mont-
gomery multiplication algorithms. IEEE Micro, 16(3):26–33, June 1996.
[Ker83] A. Kerckhoff. La cryptographie militaire. Journal de Sciences Militaires,
IX:5–83, Jan 1883.
[KJJ99] P. C. Kocher, J. Jaffe, and B. Jun. Differential power analysis. In
M. Wiener, editor, Advances in Cryptology–CRYPTO ’99, volume 1666
of Lecture Notes in Computer Science LNCS, pages 388–397. Springer,
Aug 1999.
[KKT04] M. G. Karpovsky, K. J. Kulikowski, and A. Taubin. Robust protection
against fault-injection attacks of smart cards implementing the advanced
encryption standard. In L. Simoncini, editor, Proc. Int. Conf. Dependable
Systems and Networks (DSN’04), pages 93–101. IEEE Computer Society,
IEEE Press, 2004.
[KKT05] K. Kulikowski, M. G. Karpovsky, and A. Taubin. Robust codes for fault
attack resistant cryptographic hardware. In L. Breveglieri and I. Koren,
132
editors, 2nd Int. Workshop on Fault Diagnosis and Tolerance in Cryp-
tography (FDTC’05), Sep 2005.
[KO63] A. Karatsuba and Y. Ofman. Multiplication of multidigit numbers by
automata. Soviet Physics-Doklady, 7:595–596, 1963.
[Koc96] P. C. Kocher. Timing attacks on implementations of diffie-hellman, rsa,
dss, and other systems. In CRYPTO ’96: Proceedings of the 16th Annual
International Cryptology Conference on Advances in Cryptology, pages
104–113, London, UK, 1996. Springer-Verlag.
[KT04] M. Karpovsky and A. Taubin. New class of nonlinear systematic error
detecting codes. IEEE Transactions on Information Theory, 50(8):1818–
1820, August 2004.
[LN83] R. Lidl and H. Niederreiter. Finite Fields. Addison-Wesley, Reading,
MA, 1st edition, 1983.
[LRP06] K. Lemke-Rust and C. Paar. An adversarial model for fault analy-
sis against low-cost cryptographic devices. In L. Breveglieri, I. Koren,
D. Naccache, and J.-P. Seifert, editors, Fault Diagnosis and Tolerance in
Cryptography 2006 (FDTC ’06), volume 4236 of Lecture Notes in Com-
puter Science, pages 131–143, Heidelberg, Oct 2006. Springer.
[Man67] D. Mandelbaum. Arithmetic codes with large distance. IEEE Transac-
tions on Information Theory, 13(2):237–242, April 1967.
[MM00] S. Mitra and E. J. McCluskey. Which concurrent error detection scheme
to choose? In Proc. Int. Test Conference (ITC), pages 985–994. IEEE,
IEEE Press, 2000.
133
[oSN00] National Institute of Standards and Technology (NIST). Digital Signa-
ture Standard (DSS). Number 186-2 in Federal Information Processing
Standards Publication. U.S. Department of Commerce, Jan 2000.
[O¨SS04] E. O¨ztu¨rk, B. Sunar, and E. Savas¸. Low-power elliptic curve cryptogra-
phy using scaled modular arithmetic. In M. Joye and J.-J. Quisquater,
editors, Workshop on Cryptographic Hardware and Embedded Systems–
CHES 2004, volume 3156 of Lecture Notes in Computer Science LNCS,
pages 92–106. Springer, Aug 2004.
[Par00] B. Parhami. Computer Arithmetic: Algorithms and Hardware Designs.
Oxford University Press, 2000.
[PQ03] G. Piret and J.-J. Quisquater. A differential fault attack technique
against SPN structures, with application to the AES and Khazad. In
C. D. Walter, C¸. K. Koc¸, and C. Paar, editors, Proc. Int. Workshop
on Cryptographic Hardware and Embedded Systems (CHES’03), volume
2779 of Lecture Notes in Computer Science, pages 77–88, Heidelberg,
2003. Springer-Verlag.
[Pra57] E. Prange. Cyclic error correcting codes in two symbols. Technical Re-
port AFCRC-TN-57-103, USAF Cambridge Research Laboratory, Bed-
ford, Mass., 1957.
[Pra86] D.K. Pradhan, editor. Fault Tolerant Computing – Theory and Tech-
niques, volume 1. Prentice-Hall, New Jersey, 1st edition, 1986.
[Pro89] I.K. Proudler. Idempotent AN codes. In IEE Colloquium on Signal Pro-
cessing Applications of Finite Field Mathematics, pages 8/1–8/5, London,
June 1989. IEE, IEE.
134
[QS01] J.-J. Quisquater and D. Samyde. Electromagnetic analysis (ema): Mea-
sures and counter-measures for smart cards. In I. Attali and T. Jensen,
editors, Smart Card Programming and Security, volume 2140 of Lecture
Notes in Computer Science LNCS, pages 200–210, Berlin / Heidelberg,
Sep 2001. Springer.
[Ren78] D.A. Rennels. Architectures for fault-tolerant spacecraft computers. Pro-
ceedings of the IEEE, 66(10):1255–1268, Oct 1978.
[RG71] T. R. N. Rao and O. N. Garcia. Cyclic and multiresidue codes for arith-
metic operations. IEEE Trans. Inf. Theory, 17(1):85–91, Jan 1971.
[RH02] A. Reyhani-Masoleh and M.A. Hasan. Error detection in polynomial basis
multipliers over binary extension fields. In B. S. Kaliski, Jr., C¸. K. Koc¸,
and C. Paar, editors, Cryptographic Hardware and Embedded Systems -
CHES 2002, volume 2523 of Lecture Notes in Computer Science, pages
515–528, Heidelberg, 2002. Springer. 4th International Workshop, Red-
wood Shores, CA, USA.
[RMH04] A. Reyhani-Masoleh and M. A. Hasan. Towards fault-tolerant crypto-
graphic computations over finite fields. ACM Transactions on Embedded
Computing Systems, 3(3):593–613, August 2004.
[SA02] S. P. Skorobogatov and R. J. Anderson. Optical fault induction attacks.
In B. S. Kaliski, Jr., C¸. K. Koc¸, and C. Paar, editors, Cryptographic
Hardware and Embedded Systems - CHES 2002, volume 2523 of Lecture
Notes in Computer Science, pages 2–12, Berlin, Heidelberg, New York,
August 2002. Springer-Verlag.
135
[Sal05] G. Saldamlı. Spectral Modular Arithmetic. PhD thesis, Department of
Electrical and Computer Engineering, Oregon State University, Corvallis,
OR, Jun 2005.
[Sha99] A. Shamir. Method and apparatus for protecting public key schemes from
timing and fault attacks. US Patent No. 5,991,415, 1999.
[van92] J. H. van Lint. Introduction to Coding Theory, volume 86 of Graduate
Texts in Mathematics. Springer, second edition, 1992.
[Wag04] D. Wagner. Cryptanalysis of a provably secure CRT-RSA algorithm.
In CCS ’04: Proceedings of the 11th ACM Conference on Computer and
Communications Security, pages 92–97, New York, NY, USA, 2004. ACM
Press.
[Wal92] C. D. Walter. Faster modular multiplication by operand scaling. In
J. Feigenbaum, editor, Advances in Cryptology - CRYPTO ’91: Proceed-
ings, volume 576 of Lecture Notes in Computer Science, pages 313–323,
Heidelberg, 1992. IACR, Springer.
[WHBG02] H. Wu, M. A. Hasan, I. F. Blake, and S. Gao. Finite field multi-
plier using redundant representation. IEEE Transactions on Computers,
51(11):1306–1316, November 2002.
[YJ00] S. M. Yen and M. Joye. Checking before output may not be enough
against fault-based cryptanalysis. IEEE Trans. Comp., 49(9):967–970,
Sep 2000.
136
Appendix
For practical purposes we present a selection of useful parameters for scaled em-
bedding. In all three tables the parameter n refers to the size in bits, respectively
degree, of the scaled modulus m = p ·s, while the parameter k is indicative of the size
(degree) of the field modulus of F to be embedded in the ring. Finally, the amount of
redundancy due to the scaling factor s is quantified by the difference n−k. Table A.1
provides parameters for prime field embedding, with k = dlog2 pe and n−k = dlog2 se,
while Tables A.2 and A.3 give parameters suitable for embedding binary extension
fields, where p(x) is of prime degree deg(p(x)) = k and the scaling factor s(x) has
degree deg(s(x)) = n− k.
n k u n-k
300 160 1 140
205 161 1 44
211 162 3 49
236 162 1 74
239 162 -1 77
232 163 1 69
209 166 -3 43
n k u n-k
203 168 -3 35
261 168 -1 93
208 171 3 37
227 172 -1 55
205 173 3 32
210 173 3 37
202 174 -3 28
n k u n-k
206 177 3 29
200 180 -3 20
221 181 -1 40
223 184 1 39
259 184 -1 75
233 186 -1 47
210 193 -3 17
n k u n-k
229 203 1 26
256 206 1 50
233 208 1 25
241 217 -1 24
248 227 1 21
251 232 1 19
Table A.1: Factorizations of m = p · s = 2n + u
137
n k n-k
173 163 10
190 163 27
202 163 39
264 163 101
209 179 30
235 191 44
308 191 117
334 191 143
239 193 46
306 211 95
390 211 179
371 239 132
391 239 152
452 251 201
412 263 149
n k n-k
482 269 213
419 277 142
495 277 218
587 311 276
605 311 294
470 313 157
355 337 18
446 337 109
544 337 207
604 337 267
669 337 332
578 349 229
590 349 241
674 349 325
468 353 115
n k n-k
475 359 116
662 359 303
456 367 89
728 373 355
407 379 28
401 389 12
626 389 237
724 389 335
492 397 95
559 397 162
623 397 226
715 401 314
458 409 49
827 419 408
746 443 303
n k n-k
862 443 419
786 449 337
831 457 374
920 461 459
630 463 167
760 463 297
618 467 151
577 479 98
748 491 257
763 503 260
764 503 261
849 503 346
957 503 454
553 521 32
779 521 258
Table A.2: Factorizations of m(x) = p(x) · s(x) = xn + x+ 1
n k n-k
235 167 68
283 179 104
199 181 18
207 199 8
319 199 120
405 223 182
357 229 128
281 233 48
n k n-k
291 251 40
383 269 114
321 281 40
417 281 136
543 281 262
295 293 2
341 293 48
509 317 192
n k n-k
381 373 8
473 389 84
463 401 62
617 409 208
477 431 46
521 431 90
471 433 38
615 457 158
n k n-k
765 457 308
823 457 366
785 463 322
965 487 478
675 503 172
Table A.3: Factorizations of m(x) = p(x) · s(x) = xn + x2 + 1
138
