Threshold Implementations of the Present Cipher by Farmani, Mohammad
Worcester Polytechnic Institute
Digital WPI
Masters Theses (All Theses, All Years) Electronic Theses and Dissertations
2017-09-06
Threshold Implementations of the Present Cipher
Mohammad Farmani
Worcester Polytechnic Institute
Follow this and additional works at: https://digitalcommons.wpi.edu/etd-theses
This thesis is brought to you for free and open access by Digital WPI. It has been accepted for inclusion in Masters Theses (All Theses, All Years) by an
authorized administrator of Digital WPI. For more information, please contact wpi-etd@wpi.edu.
Repository Citation
Farmani, Mohammad, "Threshold Implementations of the Present Cipher" (2017). Masters Theses (All Theses, All Years). 1024.
https://digitalcommons.wpi.edu/etd-theses/1024
  
Threshold Implementations of the 
 
Present Cipher 
 
 
by 
 Mohammad Farmani                        
 
A Thesis 
 
Submitted to the Faculty 
 
of the 
 
WORCESTER POLYTECHNIC INSTITUTE 
 
In partial fulfillment of the requirements for the 
 
Degree of Master of Science 
 
in 
 
Electrical and Computer Engineering 
 
July 2017 
 
 
 
 
APPROVED: 
 
 
 
  
 
Professor Thomas Eisenbarth Professor Berk Sunar 
 
Major Advisor Thesis Committee 
 
 
 
 
 
  
 
Professor Alexander Wyglinski Professor John A. McNeill 
 
Thesis Committee Department Head 
Abstract
The process of securing data has always been a challenge since it is related
to the safety of people and society. Nowadays, there are many cryptographic
algorithms developed to solve security problems. However, some applications
have constraints which make it di cult to achieve high levels of security.
Light weight cryptography aims to address this issue while trying to maintain
low costs.
Side-channel attacks have changed the way of cryptography significantly.
In this kind of attacks, the attacker has physical access to the crypto-system
and can extract the sensitive data by monitoring and measuring the side-
channels such as power consumption, electromagnetic emanation, timing in-
formation, sound, etc. These attacks are based on the relationship between
side-channels and secret data. Therefore, there need to be countermeasures
to eliminate or reduce side channel leaks or to break the relationship be-
tween side-channels and secret data to protect the crypto systems against
side-channel attacks.
In this work, we explore the practicality of Threshold Implementation
(TI) with only two shares for a smaller design that needs less randomness
but is still leakage resistant. We demonstrate the first two-share Threshold
Implementations of light-weight block cipher Present. Based on implementa-
tion results, two-share TI has a lower area overhead and better throughput
when compared with a first-order resistant three-share scheme. Leakage anal-
ysis of the developed implementations reveals that two-share TI can retain
perfect first-order resistance. However, the analysis also exposes a strong
second-order leakage.
i
Acknowledgements
I would like to express my gratitude first and foremost to my thesis advisor
Thomas Eisenbarth for assistance, patience, and support during this research.
This research would not have been possible without the assistance of the
Cong Chen for the data analysis. Research results presented in this thesis
have resulted in a joint publication with Cong Chen [11].
Finally, I would like to extend my deepest gratitude to my parents Ab-
dollah Farmani and Azam Meskarian; my brothers Mojtaba Farmani and Ali
Farmani without whose love, understanding and support I could never have
completed this Master’s degree.
This research was supported in part by the National Science Foundation
under Grant No. 1261399.
ii
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 3
2.1 Symmetric Key Cryptography . . . . . . . . . . . . . . . . . . 4
2.1.1 Present . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Asymmetric Key Cryptography . . . . . . . . . . . . . . . . . 7
2.3 Physical Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Side-channel Attacks . . . . . . . . . . . . . . . . . . . 9
Simple Power Analysis (SPA) . . . . . . . . . . . . . . 9
Di↵erential Power Analysis (DPA) . . . . . . . . . . . 10
3 Two-Share Threshold Implementation Analysis 14
3.1 Threshold Implementation with Two Shares . . . . . . . . . . 15
3.1.1 potential risks . . . . . . . . . . . . . . . . . . . . . . . 19
4 Design and Implementation 20
4.1 Application to Present . . . . . . . . . . . . . . . . . . . . . . 20
4.1.1 Present with Two Shares . . . . . . . . . . . . . . . . . 21
4.1.2 Hardware Implementation . . . . . . . . . . . . . . . . 22
5 Results 26
5.1 Implementation Results . . . . . . . . . . . . . . . . . . . . . 26
5.1.1 Theoretical Analysis . . . . . . . . . . . . . . . . . . . 28
5.1.2 Practical Analysis . . . . . . . . . . . . . . . . . . . . . 30
iii
6 Conclusion 33
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
iv
Chapter 1
Introduction
1.1 Motivation
Making cryptographic algorithms secure against side channel analysis usu-
ally needs a significant area overheads. Moreover, some of the methods are
complex when we want to apply them on a hardware level. Hence, it may
make our countermeasure partially insecure [12, 21, 30].
Regarding this issue, there is scheme, Threshold Implementation (TI),
that has become popular nowadays due to its outstanding features. Unlike
other schemes [41, 21], TI is straightforward and does not need a change in
design flow which allows it to be easily applied to a wide range of ciphers while
reducing implementation error. An additional advantage of TI is that the
implementation is completely reliable against first order side-channel attacks
and allows for protection against higher order side-channel attacks [31,6].
A drawback to TI is, like most other masking schemes, it causes large
area and time overheads, often requiring huge amounts of randomness for
remasking, making practical applications di cult. The considerable increase
in area overhead and the requirement for a high-performance random number
generator makes TI an option too expensive for a wide range of applications.
TI has been generalized by Reparaz et al. to provide protection against
higher-order attacks, and reducing the number of shares to d + 1, where d
is the desired protection order. Thus, it is su cient to use two shares for
first-order protection. De Cnudde et al. has done the first evaluation of d+1
shares for an AES cipher [15].
1
1.2 Our Contribution
Although the practicality of the first order resistant three share TI to light
weight ciphers has already been explored in [35] [14], we develop the two
share Threshold Implementation (2-TI) on Present cipher [33] and explore
the practical implications for reducing the number of required shares from
three to two for first-order resistance. The result of 2-TI reduces the area
overhead and randomness by a factor of three to two. Hence, this reduction
increases the relevance of TI for a wider range of applications for side channel
protection.
The algebraic depth of the nonlinear functions of lightweight ciphers is
low which allows for the suitable implementation of TI. This feature results
in cheap and e cient masking and minimizes the need for additional ran-
domness. Additionally, our design does not require re-masking during the
round functions. This implementation significantly reduces the amount of
required randomness in comparison to other masked implementations such
as AES which during one block encryption requires more than 8,000 new
random bits [14].
Applying 2-TI on nonlinear functions (S-box) is more di cult and usually
needs some pipeline stages which has adverse e↵ects on implementation size
and latency. Furthermore, Cong Chen’s analysis [11] reveals a strong second-
order leakage both theoretically and practically.
1.3 Thesis Outline
In Chapter 2 we introduce the relevant terminologies and methods on cryp-
tography algorithms, side-channel attacks, and ways of protecting against
them, we also describe a lightweight cipher, namely Present in more detail.
Then we introduce the theoretical discussion of two-share TI in Chapter 3.
The protected version of Present is introduced in Chapter 4. We show the
implementation results and analysis in Chapter 5 and conclude the work in
Chapter 6.
2
Chapter 2
Background
In the upcoming era of computing, smart devices will have limitations re-
garding memory, computing power and battery supply as well as bandwidth
and vulnerability to attacks [17, 27]. For example, as one of the applica-
tions Internet of Things (IoT) has brought us many benefits but also raises
problems like security and privacy [1]. On the other hand, the tradeo↵s
between performance, security, and cost are highly important [27]. Due to
these constraints, there are lots of studies trying to implement lightweight
cryptography (LWC) in their applications [33, 26, 19] to find the best com-
promise between security, power consumption, high performance and low
footprint. In the last years, several lightweight block ciphers have been pro-
posed including PRIDE, PRESENT, CLEFIA, PRINCE, KLEIN, SIMON
and SPECK [1].
Area minimal implementations of cryptography are highly enviable for
vast group of embedded systems. Hence, many area e cient crypto cores are
proposed which most of them are based on lightweight block cipher designs,
like Present, Simon, Speck, or Katan. One common feature that is most
common among are e cient hardware is Serialization, for example instead
of doing the same tasks in parallel; we can use one of the tasks and try
to apply it to inputs in di↵erent clock cycles in the serial which we call it
Serialization. This feature can reduce the area implementations at the cost
of increased run time. To reach the same output, area-critical functions are
divided into sub-functions that can be applied repetitively. For instance, in
block ciphers, S-box layer usually is di cult to minimize in hardware due
to high nonlinearity. A typical area-optimized of an S-box based cipher uses
only one S-box. Then, the S-box is applied to the di↵erent parts of the inter-
3
mediate state consecutively. For example if we have 64-bit intermediate state
and a 4 bit S-box, the S-box should be applied 16 times to cover all of the 64
bits of the intermediate state. This vertical type of serialization is supported
by almost all state-of-the-art block ciphers by using one S-box (unlike DES
that uses 8 di↵erent S-boxes). To reduce the size of large S-boxes, other
techniques are applied (generally for more complex algebraic functions), by
dividing them into sub-functions that are linked together. In [34] Canright
proposes the implementations that compute the AES S-box by taking advan-
tage of tower field representation. Also, In [34] shows the implementation
of the Present S-box into mappings of algebraic degree 2, which alleviates
the side-channel protection and reduce the size at the expense of doubling
the computation time. This type of serialization is referred as a horizontal
type which is determined by the algebraic complexity of the nonlinear layer,
while the vertical type of serialization is determined by the cipher at imple-
mentation time that is usually determined by the number of S-boxes. The
parameters for area e cient hardware implementations of typical vertical
serialization range from data path sizes of 1 bit for Simon or Katan, 4 bit
for Present, up to 8 bit for AES. That means for AES, Present and Simon
or Katan, 8 bit, 4 bit and 1 bit of the cipher state are updated per cycle,
respectively. Although serial data paths decrease the combinational logic of
the crypto core to low single-digit percentages of the entire design [37, 13],
they also increase the latency of the crypto core significantly. In other words,
in applications where the latency is not critical, the registers storing the key
and state almost determine the area of a cipher. Hence, we can obtain a
considerable area e ciency by breaking the memory restraint, for instance
by hiding state and key in dedicated memory such as block RAMs [20] or
shift registers [2] for FPGAs, or by externalizing key storage [13]. Since the
remainder of the work uses Present for proof-of-concept implementations, we
provide more details on this cipher here in this chapter after brief explain-
ing about ciphers. Ciphers are divided into to major groups, symmetric and
asymmetric.
2.1 Symmetric Key Cryptography
In symmetric key cryptography (also called secret-key or shared key cryptog-
raphy) the sender and receiver share a common key for both encryption and
decryption [26]. If the key is compromised, the attacker can easily decrypt it.
4
The advantage of this type of cryptography is its faster service without using
many resources [17]. The symmetric key encryption happens in two modes
- block cipher and stream cipher [42]. In the block cipher mode, the data is
divided into some blocks while in stream cipher the data is divided as small
as single bits and the encryption takes place after randomizing it [42]. Exam-
ples of symmetric key encryption technics are DES Algorithm, Triple DES
algorithm, AES algorithm, Present algorithm, and Blowfish algorithm [29].
Data Encryption Standard (DES) was developed by IBM which is a sym-
metric key algorithm. DES always operates on blocks of equal size of 64 bit
and a key size of 56 bits. It uses both permutations and substitutions in the
algorithm. Former studies have shown that DES was no longer invulnerable
to the attacks [23].
The operation of Triple DES is similar to DES. It uses three 64-bit keys
and the procedure for encryption is the same as DES, but the process is
repeated three times. In this process it is encrypted with the first key, de-
crypted with the second key and finally encrypted with the third one [23].
DESL and DESXL are other forms of improved DES algorithm. Substi-
tution boxes (S-boxes) can take up 32% of the area in DES implementations.
One can decrease the gate complexity of DES by replacing the eight original
S-boxes with a single new one. This way seven S-boxes as well the multiplexer
will be eliminated from the system. The lightweight DES variant is called
DESL, and the chip is 20% smaller than DES. Its design enables DESL to
resist common attacks such as linear and di↵erential and the Davies-Murphy
attack [17]. If key whitening is applied to the cipher, it results in DESXL
cipher, and the security level reaches 118 bits.
2.1.1 Present
Present is a block cipher proposed in 2007, optimized for low area over-
head [8]. It is a substitution-permutation network featuring a 4 ⇥ 4 bit S-box
and a permutation layer consisting only of bit shifts, making it low cost in
hardware. It features a block size of 64 bits and a key size of 80 or 128 bits
and has 31 rounds. Present has been optimized for many application scenar-
ios, but the area-minimal implementations with a 4-bit data-path. It has also
been standardized as a lightweight cryptographic block cipher as ISO/IEC
29192-2:2012. Each round of Present cipher consists of three steps including
a key-addition layer, a substitution layer which is a non-linear function, and
a permutation layer. The general view of the Present cipher algorithm is
5
Figure 2.1: General view of Present algorithm
shown in Figure 2.1. Here under di↵erent layers of Present cipher has been
described:
addRoundKey In each round, the following operation is being done with
the operands of current state s63s62s0 and round key Ki = ki63k
i
62k
i
0 for 1 
i  32:
sj = sj   kij for 0  j  63
sBoxlayer The S-box in Present is a non-linear 4-bit to 4-bit function S:
F 42 ! F 42 shown in the following table in hexadecimal notation.
The substitution layer can be performed with 16 parallel S-box or use
only one S-box 16 times which depends on the application requirement.
6
x 0 1 2 3 4 5 6 7 8 9 A B C D E F
S(x) C 5 6 B 9 0 A D 3 E F 8 4 7 1 2
pLayer The permutation is straightforward. It rewires all of the 64-bit
data based on the following table. Every bit i of the STATE is replaced with
a bit of P (i).
i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
P (i) 0 16 32 48 1 17 33 49 2 18 34 50 3 19 35 51
i 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
P (i) 4 20 36 52 5 21 37 53 6 22 38 54 7 23 39 55
i 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
P (i) 8 24 40 56 9 25 41 57 10 26 42 58 11 27 43 59
i 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
P (i) 12 28 44 60 13 29 45 61 14 30 46 62 15 31 47 63
The key schedule The key can be 80-bit or 120-bit which is updated in
the key schedule part. The key can be 80-bit or 120-bit, although we use an
80-bit key in this work. In addRoundKey, the 64 left most bits of the current
key, k79k78k77 . . . k17k16 is used for each round. After using the round key,
the 80-bit key register K= k79k78 . . . k0 is updated by shifting, using S-box,
and xoring with round-counter as follows. The key register is rotated to the
left by 61 bit positions, then the S-box is applied to the first four bits of the
key from the left. The round-counter value of round i is exclusive-ored with
five specific bits of K, k19k18k17k16k15.
2.2 Asymmetric Key Cryptography
In this technique, di↵erent keys are used for encryption and decryption. One
key is public (published) while the second one is private. Public key methods
are substantial since they can send encryption keys or other data securely
even when both users cannot agree on a secret key in private algorithm [23].
7
Figure 2.2: Hardware architecture of the Present Cipher
Among public-key algorithms, there are three established families: ECC,
RSA (Rivest-Shamir-Adleman) and discrete algorithms [17]. ECC is consid-
ered to be the most attractive algorithm due to its smaller operand length
and lower computational requirements [17]. The most commonly used public
key encryption is RSA algorithm which can be utilized both for encryption
and digital signature purposes [23]. The key size in this algorithm should be
greater than 1024 bits for a reasonable level of security [23].
2.3 Physical Attacks
Cryptographic algorithms are usually made to resist to algebraic cryptanaly-
sis. However, most of them do not cover physical attacks which can be divided
into two classes: Passive attacks and active attacks. Active attacks analyze
the chip at the logic level and disturb the operation or reverse-engineer func-
tions. Passive attacks which are also known as side channel attacks (SCA)
can record the acoustic, power or electromagnetic emanation while a crypto-
graphic core is running on the device.
8
2.3.1 Side-channel Attacks
When Kocher et al. [22] published their study on di↵erential power analysis
(DPA), it was publicly known that if one analyzes a power traces, which is
obtained when a cryptographic primitive is running, the information can be
revealed. After a few years, correlation power analysis (CPA) was adopted
over DPA due to the higher e ciency and also fewer traces needed [9]. In
DPA, the idea is to recover part of the secret key by targeting an intermediate
state of the algorithm and trying to predict its value by making hypotheses
on the portion of the key involved [1]. Then, one should try to uncover the
link between the predictions and the traces using the Pearson correlation co-
e cient between these two variables. Usually, an appropriate leakage model
based on the Hamming weight or the Hamming distance is used [1].
The general idea behind the attacks which are based on the power con-
sumption is explained in [38] in detail. When the Crypto Device consumes
power, the current through the device will change. This current can be calcu-
lated by measuring the voltage of resistance that is serialized with the crypto
device, V0, and dividing its value by its resistance,R. The power consumption
of the device can be calculated using:
Power(CryptoDevice) = Vcc ⇥ Vo
R
(2.1)
As can be observed in Equation (2.1), the power consumption of a device
is proportional to the V0. One oscilloscope records the value of V0 and this
value would be proportional to the power consumption of the device. Two
types of attacks using power consumption are Simple power analysis and
di↵erential power analysis.
Simple Power Analysis (SPA)
Simple power analysis (SPA) is a method that involves direct interpretation of
power consumption measurements which are collected during cryptographic
operations [22]. A trace is a set of power consumption measurements that
are taken across a cryptographic operation. Di↵erent power measurement on
the trace is due to di↵erent instruction operation during the cryptographic
algorithm. For example, in Figure 2.3 an SPA trace is shown from a typical
smart card as it performs a DES operation [22]. The sixteen rounds can be
seen clearly.
9
Figure 2.3: SPA trace showing an entire DES operation [22]
Figure 2.4: An SPA trace showing round 2 and 3 [22]
Figure 2.4 is a more detailed view of the same trace showing the second
and third rounds out of the 16 DES rounds [22]. More details are visible.
For instance, left arrow shows one rotation of 28-bit DES key registers C
and D in round two, and right arrows show two rotation of them in round
3. Since the sequence of instructions execution can be revealed by SPA and
the execution path has a direct relation to the processed data, the SPA can
be used to break the cryptographic implementation.
In simple power analysis type of attack we assume that the attacker has
access to one or a few measurements and to have a successful attack, he
should also know the details of the implementation.
Di↵erential Power Analysis (DPA)
Di↵erential Power Analysis Attacks do not need detailed information about
the attacked crypto device. They require more power traces contrary to SPA
attacks. Hence, it is required to physically access a crypto device to apply
10
DPA attack on it. Di↵erential Power Analysis attack is a statistical method
for analyzing sets of measurements to identify data-dependent correlations.
It involves partitioning a set of traces into subsets and then computing the
di↵erence between the averages of these subsets. If there is a correlation
between subsets and the trace measurements, the average will approach a
non-zero value. If enough traces are available, even small correlations can be
isolated.
They use enough power traces based on in order to analyze the power con-
sumption at fixed moments of time as a function of the processed data [28].
For DPA attack, we should consider the following steps:
1. Choosing an Intermediate Point of the Algorithm to Attack:
In this step, we choose a point of the intermediate result of the cryptographic
algorithm executed in the crypto device, which depends on a data d and a
part of key k. In such attacks, d can be a plaintext or ciphertext.
2. Power Consumption Measurement: The next step of the DPA
is to measure the power consumption of the crypto device while the cryp-
tographic algorithm is being executed D times on the device. For each execu-
tion, the attacker should know the corresponding data di, d = (d1, d2, . . . , dD).
For each execution, the attacker measure the power consumption related to
each di as ti = (ti,1, . . . , ti,T ). The number of samples in each trace is de-
noted by T . Hence, the measured traces corresponding to each data di can
be denoted in a matrix T of size D ⇥ T .
3. Simulation of Hypothetical Intermediate Value: Here, for each
data di and every possible value of k, we find the hypothetical intermediate
values. Therefore, the calculations conclude in a matrix V of size D ⇥ K,
where:
vi,j = f(di, kj) i = 1, . . . , D j = 1, . . . , K (2.2)
4. Modeling the Power Consumption Values of Intermediate
Values: In this step, we want to obtain the hypothetical power consump-
tion values from intermediate values. In this regard, there are di↵erent mod-
els that we can use to get the hypothetical power consumption values such as
Least Significant Bit (LSB), Most Significant Bit (MSB), Hamming Weight
(HW), and Hamming Distance (HD). LSB and MSB consider the right most
11
bit and left most bit respectively. Hamming Weight model cares about the
number of bits with value 1 in the result. Hamming Distance model takes
into account the number of transitions of bits that occur from one state to
another. The elements of matrix V , vi,j, are mapped to elements hi,j to form
the matrix H :
hi,j = Power  model(vi,j)
The knowledge of the attacker about the target device has a great impact on
the e↵ectiveness of the attack. If the power model matches the actual power
traces more precisely, the attack has a better result at the end. Hamming
Distance and Hamming Weight models are two most famous power models.
5. Comparing the Power Models with Actual Power Traces: In
this step, the power model matrix, H, and the actual power trace matrix,T ,
are compared together. Each actual power trace should compare with the
corresponding hypothetical power model related to all the possible keys. In
other words, each column tj of the matrix T is compared with each column
hi,j of the matrix H. The result of comparison is a matrix R of size K ⇥ T
that its elements ri,j shows the comparison result of hi and tj. The indices
of matrix H with the highest value reveal two important information. First,
the position where the intermediate result is being processed. Second, in-
formation about the actual key that is used in the algorithm on a crypto
device.
12
Figure 2.5: Block Diagram of DPA attack
13
Chapter 3
Two-Share Threshold
Implementation Analysis
Threshold Implementation is an (n, n) secret sharing, which means that to
obtain the secret value we require all the shares. In such a scheme even the
knowledge of up to n   1 shares does not reveal any information about the
secret value. The shares of a secret valueX is represented by Xˆ = (X1X2Xn).
To generate a di↵erent share of value, we use xor function. To divide X into
m shares, we needm 1 random numbers. Each random number can be used
as one of the shares. For computing the share m, we xor all of the random
numbers and the secret value X together.
Many countermeasure techniques presented to counter the side-channel
attacks. When they applied to Hardware, resulted in leakage due to glitches.
In order to address this problem, Nikova et al. [32] presented a Threshold
Implementation (TI) technique. The very first proposal of this technique pro-
tects against first-order side-channel leakages. Threshold Implementation has
shown extensive applicability in a wide range of crypto algorithms from sym-
metric algorithms [34,31,5,6,39] to asymmetric algorithms [10,36] that have
been protected successfully. To protect against higher-order attacks, thresh-
old Implementation technique has been developed [7]. Moreover, the draw-
back of threshold implementation technique is addressed in [35]. Threshold
Implementation requires three properties to implement an algorithm side-
channel resistant in the presence of glitches. By using additive Boolean
masking, i.e., adding randomness, susceptible states are transformed into a
shared delineation. Functions F (.) are changed to meet the requirements of
correctness, uniformity, and non-completeness.
14
Figure 3.1: Simple function
Correctness: The output is divided into n shares, the correctness means
that if combine all of these output shares, the result should be a valid output.
(x1, y1)   (x2, y2) · · ·  (xn, yn) = (x, y)
z1   z2 · · ·  zn = z (3.1)
Non-Completeness: It needs that any sub-functions of a shared func-
tion F used to evaluate any output share have to be missing of at least one
input share for first order side-channel resistance. In [7] has been shown that
to obtain d-th order SCA resistance, any d sub-functions should be missing
of at least one input share. To reveal the secret key all the shares are needed
while this property ensures that all the shares are not present in the system
at any time instance. Hence, the glitches in the final implementation can not
leak the information of the secret key.
Uniformity: If the input shares are uniformly distributed, all interme-
diate states and the output shares must be uniformly distributed. To be
first-order resistant, this property ensures the mean leakage to be indepen-
dent of the shares. Requires all intermediate states (shares) to be uniformly
distributed.
3.1 Threshold Implementation with Two Shares
The E↵ective method of Nikova et al. [32] admits implementing any d-th order
algebraic functions in a simple way. However, implementations of functions
with higher degree need extremely more e↵orts to keep the number of shares
15
Figure 3.2: Shares of function
16
Figure 3.3: Non-completeness in TI, every input share goes to other n   1
shares
17
to minimum possible, i.e., three shares, to implement functions. In [35] o↵ers
an improved approach to use fewer shares.
The e cient implementation of 4-bit S-boxes with three shares were inves-
tigated in [24]. Also, the threshold implementations using algebraic functions
of the AES S-box to implement the area-e cient S-box with four shares [31]
or five shares [6].
The approach discussed in [35] uses similar techniques like the ones used
by the above papers, but with just two shares, which leads to reducing area
overhead and the need for randomness.
The approach for the linear operations is simple in implementation. It is
used in linear operation parts of di↵erent threshold implementations [6,10,4].
In the following, the simplest nonlinear function c = ab is shown with two
shares:
c0 = a0b0 c1 = a1b1 c2 = a0b1 c3 = a1b0 (3.2)
In the upper equation, c2 and c3 combine inputs from di↵erent shares. It
may violates the property of the non-completeness if a and b are dependent.
Hence, a and b should be statistically independent.
The 4-share output of equation (3.2) is not desirable from the area-
e ciency prospective. To minimize the number of shares, the share can
be combined together in the next cycle, e.g. c00 = c0+ c2 and c
0
1 = c1+ c3. To
meet the non-completeness property, a register-stage should be used in the
next clock cycle. It will increase the number of registers as well as latency.
As discussed in [35], the proliferation of the shares gets more compli-
cated for higher-degree functions. To achieve minimal hardware implementa-
tion, the higher-order algebraic functions break into minimal degree building
blocks to avoid share proliferation concern.
To ensure uniformity and obtain a basic nonlinear building block, we
implement z = ab+ c in two pipeline stages as:
z00 = a0b0 + c0 z
0
1 = a1b1 + c1 z0 = z
0
0 + a0b1 z1 = z
0
1 + a1b0 (3.3)
z0i and zi are computed in di↵erent cycles. Both z
0
i and zi are e↵ortlessly
uniform. Unlike (3.2) this equation only requires to store two intermediate
states. Utilizing extra register stages for the non-linear function increases
both area overhead and latency according to the number of register stages
needed. If the data path of the implementation become enough small, this
latency can be compensated.
18
Table 3.1: Comparison of leakage for a 2-sharing (S2) and 3-sharing (S3) of
a bit x in a Hamming weight model. The 2-sharing (S2) shows a leakage in
the variance  (S2).
x S2(x) S3(x) wt(S2) wt(S3) µ(S2) µ(S3)  (S2)  (S3)
0 {00, 11} {000, 011, 101, 110} {0, 2} {0, 2, 2, 2} 1 3/2 2 1
1 {01, 10} {001, 010, 100, 111} {1, 1} {1, 1, 1, 3} 1 3/2 0 1
3.1.1 potential risks
Share rotation To increase side-channel resistance, in [34] it was proposed
that in every step, we should rotate the shares. While this would be highly
hazardous since we are using just two shares. If one share overwrites to the
other share, the leakage will dependent on both shares that will reveal the
secrets. Therefore, updating the registers must be managed carefully in a
design step.
Increased Higher-order leakage The dependence of the variance on the
value of the share x can theoretically explain the potential higher order leak-
age. For instance, we compare a 2-sharing S2 and a 3-sharing S3 of a bit x.
S2(x) = hx0, x1i and S3(x) = hx0, x1, x2i respectively. We consider the Ham-
ming weight leakage model (wt(·)) for the shares. The means and variances
of the possible states for both sharings are listed in Table 3.1.
Both the 2-share and 3-share threshold implementation of x show that the
the value of x is independent of mean leakage µ(Si). However, the variance
of S2 depends on x, in particular var(S2(x = 0)) = 2 6= 0 = var(S2(x = 1))
unlike 3-sharing S3, where the variances in both cases are identical as well.
This indication shows a strong second-order leakage for 2-sharings.
19
Chapter 4
Design and Implementation
4.1 Application to Present
In this section, we apply two-share Threshold Implementation to the Present
cipher. In [24], the authors presented the 3-TI Present S-box. To achieve this,
they decomposed the non-linear S-box of degree 3 into the combination of
two quadratic functions—G function—plus some linear functions, and then
implement them with three shares. We follow their idea to use the same
decomposition but then implement them with 2-TI while still retaining uni-
formity, non-completeness, and correctness. According to [24], the S-box of
Present can be decomposed as:
S(X) = A(G(G(BX   c))  d) (4.1)
Where G(.), A, B, and the constant vectors of c, d are given as follows:
G(x, y, z, w) =(g3, g2, g1, g0) :
g3 =x+ yz + yw
g2 =w + xy
g1 =y
g0 =z + yw
(4.2)
20
A =
26666664
1 0 1 0
0 1 0 0
1 0 0 0
1 0 1 1
37777775 , B =
26666664
1 1 0 0
0 1 1 0
0 0 1 0
0 1 0 1
37777775 ,c =
h
0 0 0 1
i
,d =
h
0 1 0 1
i
(4.3)
4.1.1 Present with Two Shares
A 2-sharing scheme of G(.) can be expressed as follows:
G0(x0, y0, z0, w0, x1, y1, z1, w1) =(g03, g02, g01, g00)
g03 =x0 + y0z0 + y0z1 + y0w0 + y0w1
g02 =w0 + x0y0 + x1y0
g01 =y0
g00 =z0 + y0w0 + y0w1
(4.4)
G1(x0, y0, z0, w0, x1, y1, z1, w1) =(g13, g12, g11, g10)
g13 =x1 + y1z0 + y1z1 + y1w0 + y1w1
g12 =w1 + x0y1 + x1y1
g11 =y1
g10 =z1 + y1w0 + y1w1
(4.5)
The above sharing satisfies both correctness and uniformity when the
input shares are uniformly distributed. However, non-completeness is not
fulfilled since two shares of the same inputs are fed into the same functions
in some of the above equations.
We serialize the computations into two steps to achieve non-completeness
as illustrated in the following equations:
21
G10(x0, y0, z0, w0) =(g
1
03, g
1
02, g
1
01, g
1
00)
g103 =x0 + y0z0 + y0w0
g102 =w0 + x0y0
g101 =y0
g100 =z0 + y0w0
(4.6)
G20(x1, y0, z1, w1, g
1
03, g
1
02, g
1
01, g
1
00) =(g
2
03, g
2
02, g
2
01, g
2
00)
g203 =g
1
03 + y0z1 + y0w1
g202 =g
1
02 + x1y0
g201 =g
1
01
g200 =g
1
00 + y0w1
(4.7)
G11(x1, y1, z1, w1) =(g
1
13, g
1
12, g
1
11, g
1
10)
g113 =x1 + y1z1 + y1w1
g112 =w1 + x1y1
g111 =y1
g110 =z1 + y1w1
(4.8)
G21(x0, y1, z0, w0, g
1
13, g
1
12, g
1
11, g
1
10) =(g
2
13, g
2
12, g
2
11, g
2
10)
g213 =g
1
13 + y1z0 + y1w0
g212 =g
1
12 + x0y1
g211 =g
1
11
g210 =g
1
10 + y1w0
(4.9)
The superscript indicates the level of the circuit. Until now, we achieved
a correct, non-complete and uniform two-share implementation of G(.). the
conversion of the remaining linear operations is discussed next.
4.1.2 Hardware Implementation
As depicted in Figure 4.1, to provide the non-completeness to the design,
we use registers to separate the two parts of the G. The second part of the
22
G
R
e
g
R
e
g
R
e
g
1
0 G
2
0
G11 G
2
1
Figure 4.1: Hardware architecture of the 2-share G module
shares (G20 and G
2
1) use not only the outputs of the first part of the shares
(G10 and G
1
1) but also some of their inputs as well (depicted in Figure 4.1).
One 6-bit register and two 4-bit registers are used before the second part of
the G module, to store the inputs x0, x1, z0, z1, w0, and w1; and the outputs
of the first part of the G module, respectively.
In Figure 4.2, the S-box architecture is depicted which includes two G
modules, and functions BX + c0 and AX + d0 for the first share as well as
functions BX + c1 and AX + d1 for the second share in which c0 + c1 = c
and d0 + d1 = d. Furthermore, due to non-completeness, we use another
row of registers in between two G(.) functions in the S-box. One may ar-
gue that registers should also be inserted between non-linear functions (e.g.,
G(.)) and linear functions (e.g., AX + d0), since when they are merged the
two shares of certain variables may be combined again which fails the non-
completeness requirement. While this is true in general cases, our design
avoids this problem as G20 and G
2
1 are both independent of one share of the
inputs and hence any linear combination of g213, g
2
12, g
2
11, g
2
10 or g
2
03, g
2
02, g
2
01, g
2
00
still satisfies non-completeness.
Figure 4.3 shows the whole Present cipher with two shares. The design
includes two control inputs namely key_load and data_load. If key_load
23
BX+c0
BX+c1
G G
AX+d0
AX+d1
R
e
g
R
e
g
Figure 4.2: Hardware architecture of the 2-share S-box module
is high, at the rising edge of the clock signal, the 80-bit input key shares-
Key A and Key B- are copied to the registers Key A and Key B respec-
tively. When the data_load signal is high, at the rising edge of the clock
signal, 64 right-most significant bits of the input shares (data_in A[63:0],
data_in B[63:0]) are copied to state registers. It is worth mentioning that
when the data_load is set, i.e., loading new two shares of plaintext into the
state registers results in a reset of the state machine. That why this design
does not have a reset signal. When the two-share keys and two-share plain-
texts are loaded, both key_load and data_load must be set to zero. After
that, it takes 31 rounds to Data_out A, and Data_out B have valid cipher-
texts. In each round, the S-box and permutation operations respectively
operate the inputs to update the state registers for the next round. Con-
sidering the hardware design, each G(.) function needs one cycle and then
every S-box needs four clock cycles to compute table lookup. According to
the Figure 4.3, each 64-bit input stored in the State register needs to use
S-box 16 times. Hence, it needs 4 clock cycles for the first S-box due to its
latency, plus 15 clock cycles for other 15 S-boxes in a pipeline, also one more
clock cycle for the permutation operation. Therefore, we need 20 cycles for
each round of the Present cipher. Hence, we define another control signal,
’counter,’ in which it updates the state registers and Key registers after each
20 cycles. After each cycle of these 20 cycles, the state registers are shifted
to the right by 4 bits, and the four most significant bits of the state registers
are replaced by the outputs of substitution and permutation network. The
Present cipher has 31 rounds. Hence, a full encryption of a 64-bit input takes
620 clock cycles. We also design an unprotected Present cipher to show the
area overhead of the protected Present versus unprotected one as well as its
24
State A
State B
Permutation 
A
S-box
Key A
Key B
Update_Key A
Update_Key B
Permutation 
B
Data_in B
Data_in A
Data_out A
Data_out B
Data_ load
Counter
Data_ load
Counter
Key_load
Key_load
4
4
4
4
4
4
4
4
64
80
64
80
80
80
80
64
4
64
80
64
64
64
6464
64
80
80
64
4
64
64
64
Figure 4.3: Hardware architectures of the 2-shares Present Cipher.
impact on maximum frequency and throughput. The comparison results are
shown in Table 5.1.
As we mentioned before, more complex and important part of implemen-
tation is the non-linear part of cipher that is S-box. At the beginning of this
Chapter, we show that how we can decompose the S-box into smaller parts.
The G function is the non-linear part that the formula for that is presented.
Then, we apply 2 share threshold technique on it. To meet the required con-
ditions for threshold implementation technique, it is shown that how we can
implement the hardware for G function. Afterwards, the linear parts also
attached to the G function to build the S-box. In the end, we present the
hardware architecture of the 2-shares Present Cipher.
25
Chapter 5
Results
5.1 Implementation Results
Table 5.1 summarizes the overhead and performance of two-share implemen-
tations of a present cipher. Note that we only implement Present64/80 as
an example to show the advantage of the two-share scheme. All the designs
are implemented in Verilog and synthesized for Virtex-5 (xc5vlx50)1.
Concerning Present, we have three implementations: Unprotected, Reg-
ular 3-TI, and the novel 2-TI Present. Regarding slice registers used, regular
3-TI implementation used more than three times of the unprotected one. This
is because we should use extra registers to guarantee the non-completeness of
a first-order resistant 3-share Present cipher. Also, 2-share implementation
Table 5.1: Implementation results of two-share Present.
Design
Slice
(Regs)
Slice
(LUTs)
Max. Frequency
(MHz)
Throughput
(Mbps)
Present on Virtex 5
3-TI Present 466 (3.0x) 715 (3.1x) 397.289 45.567
2-TI Present 370 (2.4x) 742 (3.2x) 490.252 50.61
Present 154 (1x) 234 (1x) 394.563 40.73
1The results of this work was published in ASIACRYPT 2016 [11]
26
Table 5.2: Area complexity comparison for di↵erent implementation of
Present cipher.
year ref Algorithm Platform
Slice
(Regs)
Slice
(LUTs)
Max. Freq
(MHz)
2012 [18]
Present
Serial
Virtex-5
XC5VLX50
FF324-1
203 237 245.76
2012 [18]
Present
Iterative
Virtex-5
XC5VLX50
FF324
200 285 250.89
2015 [40] Present
Virtex-5
XC5VLX50
201 222 236.57
2016 [25] PRE
Spartan-6
XC6SLX16
CSG324C
136 229 221.63
2016 [25] PRE 01
Spartan-6
XC6SLX16
CSG324C
137 308 160.13
2016 [25] PRE 02
Spartan-6
XC6SLX16
CSG324C
89 226 172.92
2016
our
work
Present
Virtex-5
XC5VLX50
154 234 394.563
27
costs more than two times of unprotected Present because of the same reason
mentioned before. Moreover, it is worth mentioning that the 2-TI first order
resistant implementation uses fewer registers than 3-TI. For example, we use
extra registers in G(.) function as explained in Section 4.1. These registers
help to reduce the critical path, which explains the speed-up and resulting
increase in throughput for 2-TI Present. Leakage analysis of the circuits has
been done by Cong Chen.
In Table 5.2, the area complexity of our Present cipher implementation
versus other works has been compared, which all of them work with 80-bit
key and have 64-bit datapath except PRE02 implementation that uses a 16-
bit datapath. The architecture proposed in this work demonstrates that it
can be appealing for some applications. Since we decided to apply Threshold
Implementation technique on the Present cipher, and the implementation of
this technique can be complex on nonlinear functions of the Present cipher.
We decided to minimize the nonlinear part of the cipher. Therefore, we de-
sign the Present cipher in a way that we just need two S-boxes. One S-box
for the substitution layer to apply on plaintext in every round and the other
S-box is used in key update part to do the substitution operation to update
the key in every round. Hence, this solution can help us in two ways. First,
it reduces the size of the hardware implementation and makes it compact.
Second, we can apply the Threshold Implementation technique on it in a
more straightforward way that is described completely in Chapter 3. How-
ever, this architecture leads to a worse latency. The methods that proposed
in [25] try to implement smaller and faster implementation considering area-
performance tradeo↵s. The aim of the work in [25] is to reduce the datapath
width as a whole for both substitution layer and permutation layer. The
novel architecture of this work has 16-bit datapath. While this architecture
demonstrates a better area-optimized architecture, it uses 16 S-boxes that
makes it complex and unsuitable for TI technique.
5.1.1 Theoretical Analysis
First, we discuss the strong second-order leakage of two-share TI scheme
using two-share Present S-box look-up as a target, namely the key-dependent
intermediate value y = S(x k) where x, y, k are 4-bit input plaintext, S-box
output, and sub-key receptively.
28
200k 400k 600k 800k 1M 1.2M 1.4M 1.6M 1.8M 2M
0
1
2
3
4
5
6
Number of Traces
t v
al
ue
(a) 1st order t-test
100k 200k 300k 400k 500k 600k 700k 800k 900k 1M
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
Number of Traces
M
ax
im
um
 a
bo
so
lu
te
 c
or
re
la
tio
n
(b) 1st order CPA
Figure 5.1: First-order leakage analysis of synthetic data. Left: first-order
paired t-test. Right: first-order CPA; Red line corresponds to the correct
key guess
Synthetic samples and leakage model First, we generate a noise free
synthetic leakage samples of the 2-TI Present S-box based on Hamming
weight model. As shown in Section 4.1, a 2-TI S-box processes two shares (4
bits for each share) in parallel and hence we use the Hamming weight of both
output shares (8 bits in total) as the synthetic leakage samples. Further, for a
second order analysis, the synthetic data should be center-and-then-squared.
Concerning the leakage model, we use the Hamming weight of the regular
S-box output which equals the bitwise XOR between the two output shares
in the 2-TI S-box.
First-order analysis We perform first-order non-specific paired t-test on
the synthetic data and attempt to exploit any leakage using classic CPA as
well. For this purpose, 1 million synthetic leakage samples for random input
plaintext are generated as well as another 1 million for fixed inputs. The
result of t-test using the 2 million samples is shown in Figure 5.1(a) where the
t value is less than 2 as the number of traces (synthetic samples) increases to 2
million. Then, a classic first-order CPA is performed on the 1 million samples
associated with the random inputs using the above-mentioned leakage model.
The results in Figure 5.1(b) shows the correct key cannot be distinguished
from the wrong key hypotheses with as much as 1 million samples and the
attacks fail.
29
40 80 120 160 200 240 280 320 360 400
0
1
2
3
4
5
6
7
Number of Traces
t v
al
ue
(a) 2nd order t-test
20 40 60 80 100 120 140 160 180 200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Traces
M
ax
im
um
 a
bo
so
lu
te
 c
or
re
la
tio
n
(b) 2nd order CPA
Figure 5.2: Second-order leakage analysis of synthetic data. Left: second-
order paired t-test. Right: second-order CPA; Red line corresponds to the
correct key guess
Second-order analysis Then, we proceed with second-order non-specific
paired t-test and CPA. For this purpose, 200 synthetic leakage samples for
random input plaintext are generated as well as another 200 for fixed in-
puts.Figure 5.2(a) shows that t value exceeds 4.5 with only a couple of hun-
dreds of samples while classic CPA can recover the correct key with less than
a hundred samples as shown in Figure 5.2(b).
In summary, the theoretical analyses also show the first-order resistance
of 2-TI scheme but reveal a strong second-order leakage. This strong second-
order leakage is caused by the di↵ering variances, as pointed out in Sec-
tion 3.1.1. Note that we use perfect Hamming weight model for synthetic
data without adding any noise. Hence, the CPA with a Hamming weight
model can e ciently recover the key because it captures the leakage well. In
fact, CPA on a perfect Hamming weight leakage is comparable to a profiled
attack, in the absence of noise. However, in the real world, actual leakages
are more complex, and CPA with Hamming weight model will not be as e -
cient as in this synthetic scenario. In the following, we will analyze practical
implementations to show this.
5.1.2 Practical Analysis
Next, we discuss the leakage analysis results for the two-share implemen-
tations of Present. First, we apply the non-specific paired t-test method
30
from [16] to detect any data-dependent leakage. Fixed (F) and random (R)
measurements are interleaved using the FRRF pattern.
2-TI Present 10 million traces are captured for the two-share Present
implementation and then analyzed using the paired t-test. The first order
t-statistic is still below 4.5 with 10 million measurements, as shown in Fig-
ure 5.3(a). The second order t-statistics exceeds the threshold with about
6000 traces as shown in Figure 5.3(b). Again, the results suggest that two-
share TI holds the promise of first order resistance, but fails terribly on the
second order resistance.
0 1M 2M 3M 4M 5M 6M 7M 8M 9M 10M
2.5
3
3.5
4
4.5
5
Number of Traces
M
ax
im
um
 t 
va
lu
e
 
 
Paired T−test
(a) 1st order t-test
0 2k 4k 6k 8k 10k 12k 14k 16k 18k 20k
2
3
4
5
6
7
8
Number of Traces
M
ax
im
um
 t 
va
lu
e
 
 
Paired T−test
(b) 2nd order t-test
Figure 5.3: Leakage detection results for the two-share implementation of
Present for first order (left) and second order (right) leakage over the number
of traces. Note that the dimensions change for both axes.
We performed second-order CPA on 5 million random traces (center-
and-then-squared) on 2-TI Present, targeting at the S-box output to exploit
the leakage. Recall our 2-TI Present in which the 64-bit state registers are
right rotated by 4 bits per clock cycle so that the least significant nibble is
continuously fed into the S-box look-up and output is written back to the
most significant nibble after 4 clock cycles. Therefore, a Hamming distance
leakage occurs between consecutive output nibbles. In this attack, we use
the Hamming distance power model between the first two consecutive S-box
outputs which depends on the least significant key byte and thus 28 key
hypotheses are required. The max correlations per key hypothesis over the
number of traces are shown in Figure 5.4 and the results s how that correct
31
1M 2M 3M 4M 5M
−0.01
−0.008
−0.006
−0.004
−0.002
0
0.002
0.004
0.006
0.008
0.01
Number of Traces
M
ax
im
um
 c
or
re
la
tio
n
Figure 5.4: Second-order CPA of two-share Present
key can be successfully recovered with more than 1 million traces which
demonstrate the practical exploitability of detected leakage.
The results from validating our simulation analysis for the idealized case
from Section 3.1.1 and Section 5.1.1, which suggests high second-order leak-
age.
Unlike the theoretical analysis results in Section 5.1.1 where the number
of traces needed for successful second-order t-test and CPA are of the same
order magnitude, a lot more traces are needed for practical second-order CPA
with Hamming distance model to exploit the leakage detected by t-test with
only hundreds to thousands of traces. This is mainly because: 1) Practical
implementation do not leak a perfect Hamming weight or Hamming distance
leakage; 2) Noises also render the practical attacks ine cient.
While two-share TI shows potential in preventing first order leakage with
less overhead, its poor performance on second order leakage resistance com-
pared with three-sharing makes it less worthwhile.
32
Chapter 6
Conclusion
The first practical threshold implementation with only two shares is o↵ered
by this work. It is explained that why we use a lightweight cipher, Present, as
a target for threshold implementation. Moreover, we show that how apply-
ing two shares can lead to a more compact cipher implementation with less
randomness. Although moving to two shares makes implementation process
of nonlinear part of the cipher more complex, it reduces the area overhead
of the cipher. Moreover, leakage analysis shows that this implementation
retains a perfect first-order resistance.
It is worth-mentioning that regarding leakage analysis, though two-share
shows noticeable first-order resistance that it meets the goal, we should con-
sider that it has a strong second-order leakage. Hence, it can put the practi-
cal aim of this technique in doubt, while three-share TI meets the first-order
resistance and have a better resistance against higher order attack in com-
parison with two-share TI. To decide the practical validity of this technique,
it may deserve further analysis.
6.1 Future Work
It could be interesting and beneficial to consider applying higher order TI
technique on Present Cipher and analyze the results to evaluate the com-
plexity and overhead of the design. Also, The way the circuit is designed
could also be changed and improved. Furthermore, recently a new cipher,
gift, has been o↵ered by Subhadeep Banik et al. [3]. It is claimed that while
this new cipher uses smaller area and has faster performance, it is correcting
33
the well-known weakness of Present regarding linear hull. Hence, this cipher
could be a good target to apply TI technique on it.
34
Bibliography
[1] Alexandre Adomnicai, Benjamin Lac, Anne Canteaut, Jacques Fournier,
Laurent Masson, Renaud Sirdey, and Assia Tria. On the importance of
considering physical attacks when implementing lightweight cryptogra-
phy. In Lightweight Cryptography Workshop— NIST, 2016.
[2] A. Aysu, E. Gulcan, and P. Schaumont. SIMON Says: Break Area
Records of Block Ciphers on FPGAs. Embedded Systems Letters, IEEE,
6(2):37–40, June 2014.
[3] Subhadeep Banik, Sumit Kumar Pandey, Thomas Peyrin, Yu Sasaki,
Siang Meng Sim, and Yosuke Todo. Gift: A small present. Cryptographic
Hardware and Embedded Systems-CHES, pages 25–28, 2017.
[4] B. Bilgin, B. Gierlichs, S. Nikova, V. Nikov, and V. Rijmen. Trade-
O↵s for Threshold Implementations Illustrated on AES. IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and Systems,
34(7):1188–1200, July 2015.
[5] Begu¨l Bilgin, Joan Daemen, Ventzislav Nikov, Svetla Nikova, Vincent
Rijmen, and Gilles Van Assche. E cient and First-Order DPA Re-
sistant Implementations of Keccak. In Aurlien Francillon and Pankaj
Rohatgi, editors, Smart Card Research and Advanced Applications,
Springer LNCS, pages 187–199. 2014.
[6] Begu¨l Bilgin, Benedikt Gierlichs, Svetla Nikova, Ventzislav Nikov, and
Vincent Rijmen. A More E cient AES Threshold Implementation. In
David Pointcheval and Damien Vergnaud, editors, Progress in Cryp-
tology –AFRICACRYPT 2014, volume 8469 of Springer LNCS, pages
267–284. 2014.
35
[7] Begl Bilgin, Benedikt Gierlichs, Svetla Nikova, Ventzislav Nikov, and
Vincent Rijmen. Higher-Order Threshold Implementations. In Palash
Sarkar and Tetsu Iwata, editors, Advances in Cryptology – ASIACRYPT
2014, volume 8874 of Springer LNCS, pages 326–343. 2014.
[8] A. Bogdanov, L. R. Knudsen, G. Leander, C. Paar, A. Poschmann,
M. J. B. Robshaw, Y. Seurin, and C. Vikkelsoe. PRESENT: An Ultra-
Lightweight Block Cipher. In Pascal Paillier and Ingrid Verbauwhede,
editors, Cryptographic Hardware and Embedded Systems - CHES 2007:
9th International Workshop, Vienna, Austria, September 10-13, 2007.
Proceedings, pages 450–466, Berlin, Heidelberg, 2007. Springer Berlin
Heidelberg.
[9] Eric Brier, Christophe Clavier, and Francis Olivier. Correlation power
analysis with a leakage model. In International Workshop on Crypto-
graphic Hardware and Embedded Systems, pages 16–29. Springer, 2004.
[10] Cong Chen, Thomas Eisenbarth, Ingo von Maurich, and Rainer Stein-
wandt. Masking Large Keys in Hardware: A Masked Implementation
of McEliece. In Selected Areas in Cryptography — SAC 2015. Springer
LNCS, August 2015. Preprint available at http://eprint.iacr.org/
924.
[11] Cong Chen, Mohammad Farmani, and Thomas Eisenbarth. A tale of
two shares: why two-share threshold implementation seems worthwhile-
and why it is not. In Advances in Cryptology–ASIACRYPT 2016: 22nd
International Conference on the Theory and Application of Cryptology
and Information Security, Hanoi, Vietnam, December 4-8, 2016, Pro-
ceedings, Part I 22, pages 819–843. Springer, 2016.
[12] Jean-Se´bastien Coron, Emmanuel Prou↵, and Matthieu Rivain. Side
Channel Cryptanalysis of a Higher Order Masking Scheme. In Pascal
Paillier and Ingrid Verbauwhede, editors, Cryptographic Hardware and
Embedded Systems - CHES 2007: 9th International Workshop, Vienna,
Austria, September 10-13, 2007. Proceedings, pages 28–44, Berlin, Hei-
delberg, 2007. Springer Berlin Heidelberg.
[13] Christophe De Canniere, Orr Dunkelman, and Miroslav Knezˇevic´.
KATAN and KTANTAN–A Family of Small and E cient Hardware-
36
Oriented Block Ciphers. In Cryptographic Hardware and Embedded
Systems–CHES 2009, pages 272–288. Springer, 2009.
[14] Thomas De Cnudde, Oscar Reparaz, Begu¨l Bilgin, Svetla Nikova,
Ventzislav Nikov, and Vincent Rijmen. Masking aes with d+ 1 shares in
hardware. In International Conference on Cryptographic Hardware and
Embedded Systems, pages 194–212. Springer, 2016.
[15] Thomas De Cnudde, Oscar Reparaz, Begu¨l Bilgin, Svetla Nikova,
Ventzislav Nikov, and Vincent Rijmen. Masking AES with d+1 Shares
in Hardware. In Benedikt Gierlichs and Y. Axel Poschmann, editors,
Cryptographic Hardware and Embedded Systems – CHES 2016: 18th
International Conference, pages 194–212. Springer Berlin Heidelberg,
2016.
[16] A. Adam Ding, Cong Chen, and Thomas Eisenbarth. Simpler, Faster,
and More Robust T-test Based Leakage Detection. In Constructive
Side-Channel Analysis and Secure Design - 7th International Workshop,
COSADE 2016, Graz, Austria, April 14-15, 2016, Revised Selected Pa-
pers, pages 163–183.
[17] Thomas Eisenbarth and Sandeep Kumar. A survey of lightweight-
cryptography implementations. IEEE Design & Test of Computers,
24(6), 2007.
[18] Neil Hanley and Maire ONeill. Hardware comparison of the iso/iec
29192-2 block ciphers. In VLSI (ISVLSI), 2012 IEEE Computer Society
Annual Symposium on, pages 57–62. IEEE, 2012.
[19] M Joye and JJ Quisquater. Ches 2004. lncs, vol. 3156, 2004.
[20] Elif Bilge Kavun and Tolga Yalcin. RAM-Based Ultra-Lightweight
FPGA Implementation of PRESENT. In Reconfigurable Computing and
FPGAs (ReConFig), 2011 International Conference on, pages 280–285.
IEEE, 2011.
[21] M. Kirschbaum and T. Popp. Evaluation of a DPA-Resistant Prototype
Chip. In Computer Security Applications Conference, 2009. ACSAC
’09. Annual, pages 43–50, Dec 2009.
37
[22] Paul Kocher, Joshua Ja↵e, and Benjamin Jun. Di↵erential power anal-
ysis. In Advances in cryptologyCRYPTO99, pages 789–789. Springer,
1999.
[23] Yogesh Kumar, Rajiv Munjal, and Harsh Sharma. Comparison of sym-
metric and asymmetric cryptography with existing vulnerabilities and
countermeasures. International Journal of Computer Science and Man-
agement Studies, 11(03), 2011.
[24] Sebastian Kutzner, PhuongHa Nguyen, Axel Poschmann, and Huaxiong
Wang. On 3-Share Threshold Implementations for 4-Bit S-boxes. In Em-
manuel Prou↵, editor, Constructive Side-Channel Analysis and Secure
Design, volume 7864 of Springer LNCS, pages 99–113. 2013.
[25] Carlos Andres Lara-Nino, Miguel Morales-Sandoval, and Arturo Diaz-
Perez. Novel fpga-based low-cost hardware architecture for the present
block cipher. In Digital System Design (DSD), 2016 Euromicro Confer-
ence on, pages 646–650. IEEE, 2016.
[26] Gregor Leander, Christof Paar, Axel Poschmann, and Kai Schramm.
New lightweight des variants. In International Workshop on Fast Soft-
ware Encryption, pages 196–210. Springer, 2007.
[27] Diana Maimut and Khaled Ouafi. Lightweight cryptography for rfid
tags. IEEE Security & Privacy, 10(2):76–79, 2012.
[28] S. Mangard, M. E. Oswald, and T. Popp. Power Analysis Attacks -
Revealing the Secrets of Smart Cards. Springer, 2007.
[29] Mohit Mittal. Performance evaluation of cryptographic algorithms. In-
ternational Journal of Computer Applications, 41(7), 2012.
[30] Amir Moradi and Oliver Mischke. How Far Should Theory Be from
Practice?, pages 92–106. Springer Berlin Heidelberg, Berlin, Heidelberg,
2012.
[31] Amir Moradi, Axel Poschmann, San Ling, Christof Paar, and Huaxiong
Wang. Pushing the Limits: A Very Compact and a Threshold Implemen-
tation of AES. In Kenneth G. Paterson, editor, Advances in Cryptology
— EUROCRYPT 2011, volume 6632 of Springer LNCS, pages 69–88.
2011.
38
[32] Svetla Nikova, Christian Rechberger, and Vincent Rijmen. Threshold
Implementations Against Side-Channel Attacks and Glitches. In Peng
Ning, Sihan Qing, and Ninghui Li, editors, Information and Communi-
cations Security, volume 4307 of Springer LNCS, pages 529–545. 2006.
[33] Pascal Paillier and Ingrid Verbauwhede. Cryptographic hardware and
embedded systems-ches 2007. In International Workshop on Crypto-
graphic Hardware and Embedded Systems, pages E1–E1. Springer, 2007.
[34] Axel Poschmann, Amir Moradi, Khoongming Khoo, Chu-Wee Lim,
Huaxiong Wang, and San Ling. Side-Channel Resistant Crypto for less
than 2,300 GE. Journal of Cryptology, 24(2):322–345, 2011.
[35] Oscar Reparaz, Begu¨l Bilgin, Svetla Nikova, Benedikt Gierlichs, and
Ingrid Verbauwhede. Consolidating Masking Schemes, pages 764–783.
Springer Berlin Heidelberg, Berlin, Heidelberg, 2015.
[36] Oscar Reparaz, Sujoy Sinha Roy, Frederik Vercauteren, and Ingrid Ver-
bauwhede. A Masked Ring-LWE Implementation. In Cryptographic
Hardware and Embedded Systems–CHES 2015, pages 683–702. Springer,
2015.
[37] Carsten Rolfes, Axel Poschmann, Gregor Leander, and Christof Paar.
Ultra-Lightweight Implementations for Smart Devices–Security for 1000
Gate Equivalents. In Smart Card Research and Advanced Applications,
pages 89–103. Springer, 2008.
[38] Aria Shahverdi. Lightweight Cryptography Meets Threshold Implemen-
tation: A Case Study for Simon. PhD thesis, WORCESTER POLY-
TECHNIC INSTITUTE, 2015.
[39] Aria Shahverdi, Mostafa Taha, and Thomas Eisenbarth. Silent Simon:
A Threshold Implementation under 100 Slices. In Hardware Oriented
Security and Trust (HOST), 2015 IEEE International Symposium on,
pages 1–6, May 2015.
[40] JJ Tay, MLD Wong, MM Wong, C Zhang, and I Hijazin. Compact fpga
implementation of present with boolean s-box. In Quality Electronic
Design (ASQED), 2015 6th Asia Symposium on, pages 144–148. IEEE,
2015.
39
[41] Kris Tiri and Ingrid Verbauwhede. A Logic Level Design Methodology
for a Secure DPA Resistant ASIC or FPGA Implementation. In Pro-
ceedings of the Conference on Design, Automation and Test in Europe -
Volume 1, DATE ’04, pages 10246–, Washington, DC, USA, 2004. IEEE
Computer Society.
[42] Ritu Tripathi and Sanjay Agrawal. Comparative study of symmetric and
asymmetric cryptography techniques. International Journal of Advance
Foundation and Research in Computer (IJAFRC), 1(6):68–76, 2014.
40
