Higher-Order Glitch Resistant Implementation of the PRESENT S-Box by De Cnudde, Thomas et al.
Higher-Order Glitch Resistant Implementation
of the PRESENT S-Box
Thomas De Cnudde1, Begu¨l Bilgin1,2, Oscar Reparaz1, and Svetla Nikova1
1 KU Leuven, ESAT-COSIC and iMinds, Belgium
{name.surname}@esat.kuleuven.be
2 University of Twente, EEMCS-SCS, The Netherlands
Abstract. Glitches, occurring from unwanted switching CMOS gates,
have been shown to leak information even when side-channel countermea-
sures are applied to hardware cryptosystems. The polynomial masking
scheme presented at CHES 2011 by Roche et al. is a method that of-
fers provable security against side-channel analysis at any order even in
the presence of glitches. The method is based on Shamir’s secret sharing
and its computations rely on a secure multi-party computation proto-
col. At CHES 2013, Moradi et al. presented a first-order glitch resistant
implementation of the AES S-box based on this method. Their work
showed that the area and speed overheads resulting from the polynomial
masking are high. In this paper, we present a first-order glitch resistant
implementation of the present S-box which is designed for lightweight
applications, indicating less area and randomness requirements. More-
over, we provide a second-order glitch resistant implementation of this
S-box and observe the increase in implementation requirements.
Keywords: Polynomial masking, Glitches, Sharing, present, S-box
1 Introduction
Radio frequency identification (RFID) systems, wireless sensor networks, smart
cards and other compact mobile applications have become prevalent in everyday
life. Their widespread deployment in applications ranging from supply chains to
intelligent homes and even electronic body implants, has made their security a
pressing issue. While block ciphers provide su cient security against cryptanaly-
sis for these applications, their hardware implementations are susceptible to side-
channel leakage. By exploiting these leaks through side-channel analysis (SCA),
a cryptosystem can be compromised more easily than promised by the cryptan-
alytic security. A common side-channel analysis is Di↵erential Power Analysis
(DPA) [14]. DPA exploits dependencies between the instantaneous power con-
sumption of a device and the intermediate values arising in the computation of
a cryptographic operation.
Several countermeasures have been proposed to cope with these side-channels.
Secure logic styles that balance the power consumption of di↵erent data val-
ues [24] can be used or noise can be increased in the form of random delays,
random execution orders or by inserting dummy operations [25]. Even though
an analysis becomes harder as this noise increases, these techniques do not pro-
vide provable security. A popular countermeasure that does provide provable
security under certain assumptions is masking [5,10]. This method conceals sen-
sitive information, such as key and plaintext related information, using random
values. Compared to a naive implementation, a well implemented masked imple-
mentation typically o↵ers more resistance against power analysis attacks, and
makes the attack much more expensive as the order d of the masking increases.
This masking order d in turn defines the order d + 1 of the attack needed to
retrieve the sensitive information. This attack order sets the number of shares
that are jointly exploited by either analyzing the (d+ 1)th-order statistical mo-
ment of the leakage at one point in time or by nonlinearly combining leakages
from d + 1 points in time. Such an attack is known as a (d + 1)th-order DPA
attack. A dth-order secure implementation can consequently always be broken
by a (d + 1)th-order attack. When the attack order is larger than one, this is
known as a higher-order DPA (HO-DPA) attack [5, 17].
Masking is however deteriorated by the switching behaviour of CMOS tran-
sistors, the so called glitching e↵ect [15, 16]. Two masking schemes that show
provable security against DPA in the presence of glitches, or glitch resistance
for short, are the polynomial masking scheme [22] and threshold implementa-
tions [19]. While, at the time of writing, the latter achieves glitch resistance
at the first-order only, the former provides this security also for higher orders.
Therefore, we consider the polynomial masking scheme in this paper.
Masking introduces an overhead on the area and throughput. To avoid overly
large and slow implementations, we will focus on lightweight, i.e. compact and
power e cient, block ciphers. A popular lightweight block cipher is present [3]
which, as of 2012, is part of the ISO/IEC 29192-2 standard [13], making its
side-channel resistance relevant. Besides present, its S-box is also used in other
lightweight cryptographic algorithms, including the led block cipher [12], the
gost revisited block cipher [20] and the photon lightweight hash function [11].
In this paper, we focus on glitch resistant implementations of the nonlinear part
of present, the S-box, since this is typically the most challenging part of a
masked implementation.
Related work. An algorithmic description of a first-order glitch resistant Ad-
vanced Encryption Standard (AES) implementation using the polynomial mask-
ing scheme is given in [22]. In [18], this description is used to implement a first-
order glitch resistant AES S-box on an FPGA. The present S-box has, to our
knowledge, not yet been implemented using polynomial masking.
Contribution. In this paper, we present a first- and a second-order polyno-
mially masked implementation of the 4-bit present S-box. To our knowledge,
this is the first second-order present S-box implementation showing resistance
against second-order DPA in the presence of glitches. The implementations are
based on the guidelines for the first-order glitch resistant AES implementation
proposed in [18]. We also present experimental confirmation showing that the
implementations indeed achieve their claimed security. To this end, we applied
univariate and bivariate leakage detection based on Welch’s t-test.
Organization. Section 2 introduces the necessary background regarding the
polynomial masking scheme and the present S-box.The design decisions, hard-
ware implementations and their costs are presented in Section 3. The SCA results
are shown in Section 4. Finally, the conclusion is drawn in Section 5.
2 Preliminaries
2.1 PRESENT Block Cipher
The present block cipher [3] is a symmetric key encryption algorithm designed
considering the heavy constraints on performance, area and timing requirements
of lightweight hardware applications. Its block length equals 64-bits. Key lengths
of 80- and 128-bits are supported, which are referred to as present-80 and
present-128 respectively. For lightweight applications, present-80 is recom-
mended. The present cipher performs 31 rounds followed by a final key whiten-
ing stage. Each round consists of a binary addition with the round key and a
substitution-permutation network. The permutation layer is bit oriented and
can easily be implemented by wiring, making it very hardware friendly. The
substitution layer applies 16 identical 4-bit S-boxes governed by the following
table:
Table 1. 4-bit to 4-bit substitution of the present S-box [3] in hexadecimal notation.
z 0 1 2 3 4 5 6 7 8 9 A B C D E F
S[z] C 5 6 B 9 0 A D 3 E F 8 4 7 1 2
2.2 Polynomial Masking Scheme
Side-channel resistance in the presence of glitches can be achieved at any order
by the polynomial masking scheme [22]. Sensitive variables are masked using
Shamir’s secret sharing scheme [23] and computations on the resulting shares
are performed using the BGW’s secure multi-party computation protocol [2].
In Shamir’s scheme, a secret Z 2 K ⌘ F2m is shared among n < 2m players
such that d+1 players are needed to reconstruct Z. To this end, a dealer generates
a degree-d polynomial PZ(X) 2 K[X] with constant term Z and secret, random
coe cients ai:
PZ(X) = Z +
dX
i=1
aiX
i
When working in the field K, we will denote binary addition and field multipli-
cation by + and . respectively.
This polynomial is then evaluated in n distinct, non-zero elements ↵1, ...,↵n 2
K, which are called the public coe cients and are available to all players. Lastly,
each resulting value Zi = PZ(↵i) is distributed to its corresponding player i. The
secret Z can be reconstructed using the first row ( 1, ..., n) of the inverse of the
(n⇥ n) Vandermonde matrix (↵ji )1i,jn as:
Z =
nX
i=1
 iZi
This is exemplified for the second-order in Appendix B.
BGW’s protocol defines how to securely operate on the shares. We can dis-
tinguish between operations that can be processed by all players independently
and operations that need communication between the players. Multiplication
of a share and a constant, addition of a share and a constant and addition of
two shares can be processed by each player independently. As a result, these
operations can be implemented straightforwardly. Multiplication of two shares,
which is referred to as shared multiplication, requires the players to exchange
information, which complicates its secure execution. This operation has to be
performed in three steps [22]:
1. Each player multiplies its shares, resulting in a 2d-degree polynomial
2. Each player masks the result of the previous multiplication and sends these
shares to all other players
3. Each player reconstructs the result by interpolation and evaluation in the
public coe cients
When the square of a share is desired, the shared multiplication can be omitted
when following conditions are imposed [22]:
– The public coe cients ↵i are distinct and non-zero
– The public coe cients ↵i are stable over the Frobenius automorphism: for
every ↵i, there exists an ↵j such that ↵j = ↵2i
Each player can then independently perform the squaring on its own share but
a reordering of the shares is needed between player i and player j when i 6= j to
keep the right public coe cient linked to its corresponding player.
To achieve glitch resistance with this masking scheme, two conditions need
to be fulfilled. Firstly, the number of players has to exceed twice the degree of
the polynomial, i.e. n > 2d. Secondly, each player has to leak independently of
all other players.
2.3 Cyclotomic Classes
The masking complexity of an S-box is defined in [4] as the minimal number of
nonlinear multiplications required to evaluate its polynomial. These nonlinear
multiplications correspond to shared multiplications.
When calculating a power x↵ from another power x  , a nonlinear multipli-
cation can be omitted if and only if ↵ and   lie in the same cyclotomic class. A
cyclotomic class is defined as follows.
Definition 1 (Cyclotomic Class). With m 2 N and ↵ 2 [0; 2m   2], the
cyclotomic class C↵ of ↵ w.r.t. m is defined as:
C↵ = {↵ · 2i mod 2m   1, i 2 [0;m  1]}
For the present S-box, we work in field F24 . Its corresponding cyclotomic classes
are:
C0 = {0}, C1 = {1, 2, 4, 8}, C3 = {3, 6, C, 9}, C5 = {5, A}, C7 = {7, E,D,B}
(1)
An important property is that we can cycle through the elements of a cyclotomic
class by squaring, which can be performed independently by all players when the
conditions listed in Section 2.2 are fulfilled. As squaring is linear in F24 , the S-
box complexity equals the number of di↵erent transitions between these classes
required to evaluate the S-box substitution function.
3 Hardware Implementation
In this section, the hardware implementations of the first-order and second-order
glitch resistant present S-box are explained. First, the polynomial of the S-
box and its evaluation order are established. Then the detailed first-order glitch
resistant implementation is discussed. Afterwards, the modifications required to
achieve second-order glitch resistance are given. This section is concluded with
an overview of the implementation requirements.
3.1 Evaluation Order
The substitution of any 4-bit S-box can be expressed as a unique polynomial
over F24 with a degree of at most 24   1 = 15. This polynomial can be obtained
by expanding the following expression [7]:
S(x) =
X
z2F24
S(z)(1 + (x+ z)15) =
15X
i=0
cix
i
Using the Mattson-Solomon polynomial, the coe cients ci of S(x) can directly
be computed by:
ci =
8><>:
S(0), if i = 0P24 2
k=0 S(↵
k)↵ ki, if i  i  24   2
S(1) +
P24 2
i=0 ci, if i = 2
4   1
where ↵ is a primitive element in F24 .
If we use x4+x+1 as irreducible polynomial for the construction of F24 , we
get the following polynomial for the present S-box given in Table 1.
S(x) = Dx14 +Dx13 + Cx12 + Ex11 + 9x10 + 9x9 + 7x8
+ 4x7 + Cx6 +Ax5 + Ex4 + 7x3 + 7x2 + C
The evaluation order of this polynomial is an adaptation of the proposal by
Carlet et al. in [4] to reduce the required memory and area by processing sequen-
tially instead of in parallel. The block diagram of the present S-box evaluation
is depicted in Figure 1. The gray multipliers symbolize a field multiplication
with a constant, while the black multipliers represent a shared multiplication.
Starting from input x, squaring is consecutively carried out until all elements of
the cyclotomic class C1 from Equation (1) are covered. The last element of that
class is then multiplied with x to access cyclotomic class C3, where all elements
are again obtained by squaring. After a multiplication with x, squaring is per-
formed again to reach all elements in C7. To access the final cyclotomic class C5,
a multiplication with x11 is chosen, as multiplying our last obtained power with
x would lead back to class C1. This value will need to be stored separately.
From this discussion it is apparent that a shared multiplier cannot be omit-
ted. As our primary design goal is low area, we choose to only implement a
shared multiplier to handle all shared multiplications. However, by evaluating
the polynomial this way, the designs can easily be extended with a dedicated
squaring circuit and benefit from a significant reduction of required randomness.
This extension is left as future work.


         
           
Fig. 1. Block diagram of the evaluation for the present S-box.
3.2 First-Order Glitch Resistant PRESENT S-box
To achieve first-order glitch resistance, both conditions in Section 2.2 have to
be fulfilled. Namely, our sensitive variables need to be masked by a first-order
polynomial and need to be shared between three players with independent side-
channel leakage. In order to achieve this independent leakage, we choose to tem-
porally separate the players’ operations. After each operation, the intermediate
results are stored and left unaltered while another player is active. The design
is shown in Figure 2 and is similar to the AES S-box implementation from [18].
This design is compatible with all combinational finite field multipliers. The one
used in our implementations is given in Appendix A.
Shared multiplier. As pointed out in Section 2.2, the shared multiplication
di↵ers from the other operations in that it needs communication between the
players. To achieve this, the computations are divided in two parts and the
communicated intermediate values are stored in registers.
Step 1 and Step 2 (Section 2.2) of the shared multiplication are performed
in the mult el1i blocks. With every shared multiplication, each player receives
a new random coe cient ai to remask the multiplication of its input shares
ti. The reconstruction in Step 3 is handled by the mult el2i blocks once all
intermediate results are available.
The detailed working principle is described in series of clock cycles. Such a
series consists of six clock cycles and is related to the control signals em1i6,
which can be seen in Figure 2. During each series, a shared multiplication is
realized.
– The first clock cycle of a series, enables signal em1. The two required inputs
for the shared multiplier are selected by selm1. At the same time, a new
random number a1 is fed to the mult el11 block. Together with this random
number, the fixed public coe cients ↵1,↵2 and ↵3 are used to remask the
multiplied input shares t1.
– The same procedure is repeated on the second clock cycle using signal em2 in
block mult el12 and on the third clock cycle using em3 in block mult el13.
After the third clock cycle, all intermediate results are available.
– In the fourth clock cycle, by activating signal em4, the intermediate results
related to the first public coe cient ↵1 are stored in the registers q1,1, q2,1,
q3,1. The combinatorial logic in block mult el21 then performs the recon-
struction using  1,  2 and  3. This outputs the first share of the shared
multiplication. The result is not saved in this clock cycle, but will be done
at the start of the next series, with the activation of the select signal em1.
– In the fifth and sixth clock cycles, the same principles as in the fourth clock
cycle apply. The enable signal em5 handles the reconstruction related to
the second public coe cient ↵2 in block mult el22 and em6 serves the
reconstruction related to the third public coe cient ↵3 in block mult el23.
Note that, except for the registers, the shared multiplier is entirely combinatorial.
Therefore, the mult el1i and mult el2i blocks are only active when a new
value is assigned to their input registers. After one clock cycle, the intermediate
values reach their stable states and the blocks stay idle until their input registers
are changed again. By temporally separating the emi signals with a carefully
designed control unit, we achieve the required temporal separation.
Input selection. The right inputs for the shared multiplier are selected by the
multiplexers in the ctrl eli blocks. A glitch on the select signal of a multiplexer
can temporarily change the inputs of the shared multiplier and induce processing
in a player that is supposed to be idle. This would result in an overlap of leakages
of di↵erent players and would eradicate the temporal separation. To avoid this,
these selmi signals are synchronised. As was noted in Section 3.1, we need to
store one extra intermediate value x11. When the shares of this value are output
at the mult el2i blocks, the es1, es2 and es3 signals follow the levels of the
em1, em2 and em3 signals to store the shares of x11 in separate registers.
Addition and accumulation. To calculate the polynomial, the powers of x
need to be multiplied with a constant and accumulated with the previously
obtained results. This is handled by the add acc eli blocks. When the shares
of a desired power of x are ready at the outputs of the shared multiplier, the
eai signals activate with the corresponding emi signals. With the activation
of an eai signal, a new coe cient chosen by selcoe↵ is fed to an input of the
add acc eli multiplier, resulting in the right multiplication of a constant and
its corresponding power of x. In the first series of clock cycles, the constant value
C of the polynomial is added to an empty register using the selai signal, which
activates with its corresponding emi signal. In all following series of clock cycles,
the register output is chosen to accumulate the results. The eoi signal enables
the output share of player i when the register holds the final value. This signal
also activates with its corresponding emi signal.
  
  



     
       
  
  



     
      
    




















































































Fig. 2. Architecture diagram for the first-order present implementation.
3.3 Second-Order Glitch Resistant PRESENT S-box
We will now discuss how to extend our first-order design to the second-order.
Again, both conditions in Section 2.2 need to be fulfilled. To provide second-
order glitch resistance, our sensitive variables are now masked using a second-
order polynomial and shared among five players. We again choose temporal
separation to decouple the leakages of the di↵erent players. The operations in
this (5,2)-sharing scheme are detailed in Appendix B. Figure 5 in Appendix C
shows the resulting architecture diagram.
Shared multiplier. The mult el1i blocks now require two instead of one
random coe cients to mask the multiplication of the inputs. Furthermore, the
evaluation of the polynomial is done in five public coe cients and their squared
value is needed. When hardcoding the public coe cients and their squares, we
additionally require seven multiplications, seven additions and one register. Each
player now requires five instead of three registers to share the intermediate re-
sults. The mult el2i blocks need two extra multiplications and two extra addi-
tions to perform the reconstruction in the (5,2)-sharing scheme.
The control schedule is changed to incorporate five players. The same princi-
ples from Section 3.2 apply, but we need 10 emi signals, the first five to control
the mult el1i blocks and the last five to store the intermediate values in the
registers.
Input selection, addition and accumulation. The only change made in
these operations is the extension from three ctrl el (resp. add acc el) blocks
to five.
The security against second-order DPA in the presence of glitches of this im-
plementation can theoretically be explained as follows. As a second-order poly-
nomial is used to divide the shares among five players, the shares of at least three
players are required to interpolate the masked secret. Mixing up to two obser-
vations of intermediate variables will therefore not lead to enough information
to reveal the secret variable. Furthermore, as the computations of each player
are temporally separated, the information leaked by glitches is contained to the
share of that player only and is not influenced by the shares of other players.
This theoretical proof is valid for all orders when appropriate changes to the
players are considered.
3.4 Implementation Requirements
The total area in NAND gate equivalents (GEs) covers 3594 GE and 8338 GE
for the first- and second-order glitch resistant implementation respectively. The
largest contributions come from the shared multiplier (37.8% and 59.6%) and
the control unit (41.8% and 25.7%), both for the first- and second-order re-
spectively). The detailed area requirements of the di↵erent blocks are given in
Table 3 in Appendix D. The results are obtained from Synopsys 2010.03 using
the NanGate 45nm Open Cell Library [1].
The first-order implementation requires 89 clock cycles from the activation
of the request signal to the output of all shares. For the second-order imple-
mentation, this number becomes 149 clock cycles. The secure evaluation of the
first-order present S-box requires 156-bits of randomness. If a squaring module
is used, this randomness can drop to 36-bits trading o↵ area. For secure evalu-
ation of the second-order present S-box, the required randomness changes to
520-bits (resp. 120-bits when a squaring module is used). As all public coe cients
should be distinct and non-zero, up to 15 players can be accommodated. By im-
posing the condition that n > 2d, this leads to a maximum of a seventh-order
glitch resistant implementation for the present S-box. For all possible orders
d of glitch resistance, the required number of randomness and clock cycles are
summarized in Table 2.
Table 2. Number of clock cycles and randomness required for a dth-order glitch resis-
tant present S-box implementation.
Number of Clock Cycles 30(2d+ 1)  1
Randomness (bits) 52d(2d+ 1)
Randomness with squaring module (bits) 12d(2d+ 1)
4 SCA evaluation
In this section we provide experimental evidence that our implementations pro-
vide a reasonable guarantee against typical power analysis attacks. We perform
leakage detection tests on the present S-box, implemented on a SASEBO-G
board [21]. The board is externally clocked with a stable, relatively low-frequency
clock source of 3.072MHz. All the randomness required for the computations is
generated by an AES-based PRNG on the control FPGA. All the tests were
performed with 1M traces unless explicitly stated otherwise.
For our evaluation, we use the non-specific fixed-vs-random methodology
of [6, 9]. In a nutshell, the leakage detection test assesses whether the means
of power consumption traces, conditioned on any intermediate, are equal or
not. In the context of first-order masking, this means whether the masking is
sound or not. We stress that by using a non-specific test, we are targeting all
intermediates appearing during the computation of an S-box. This allows us to
test the implementation against a wide range of leakages, without assuming how
the implementation may leak.
The original methodology starts by taking two sets of measurements corre-
sponding to fixed plaintext and random plaintext. Then, a hypothesis test is
applied time sample per time sample to test whether the means of the two pop-
ulations are the same or not. Normally, a Student T-test is applied. Having set
a significance level beforehand, the result of the test is directly interpretable in
terms of probability. In our case, a value of the t-test statistic beyond 4.5 means
that there is leakage with high probability. For details on the test, we refer to
Appendix E. For our purposes of testing the higher-order security, we adapt the
methodology to analyze higher-order moments in univariate and bivariate distri-
butions (two time samples jointly analyzed). This is achieved by preprocessing
the power traces through a suitable combination function. In our case, we use
the centered product.
We begin with a univariate analysis of the first-order protected implementa-
tion. As a first sanity check of our experimental setup, we performed a univari-
ate first-order test with the PRNG switched o↵, thus deliberately disabling the
masking. The result of the t-test on the unmasked first-order implementation is
given in Figure 6 in Appendix E. This clearly shows that the implementation is
leaking since the t-test statistic trace exceeds the confidence threshold C = ±4.5
in several clock cycles, which is expected as the masking is inactive. If we repeat
the experiment with the PRNG enabled, the t-test statistic never exceeds the
predefined threshold as the top left corner of Figure 3 indicates.
We repeated the test on centered and squared traces. This is equivalent to test
whether there is information leakage on the variances. Note that the first-order
protected implementation is expected to leak in the second moment, as Figure 3
indicates. This only provides us with the evidence that we indeed have enough
traces to show that the first-order attack is more expensive in terms of traces
than higher-order ones, and thus our goal of first-order security is attained.
We proceeded with a univariate analysis of the second-order glitch resistant
implementation. The process follows the lines of the first-order protected imple-
mentations and the results are again shown in Figure 3. We can see that the
implementation is indeed first- and second-order univariate secure up to 1M
traces. The implementation leaks in the third-order but this poses no problem
to the security claims.
Trigger
-10
0
10
20
First-order univariate
-6
-4
-2
0
2
4
6
Second-order univariate
-10
-5
0
5
10
Third-order univariate
10 4
0.5 1 1.5 2 2.5 3
-10
-5
0
5
10
Trigger
-10
0
10
20
First-order univariate
-6
-4
-2
0
2
4
6
Second-order univariate
-10
-5
0
5
10
Third-order univariate
10 4
1 2 3 4 5
-10
-5
0
5
10
Samples Samples
t-t
es
t s
ta
tis
tic
t-t
es
t s
ta
tis
tic
First-order implementation Second-order implementation
Fig. 3. Results of the first-, second- and third-order univariate analysis on the first-
and second-order present S-box implementations.
We also performed a preliminary bivariate analysis. To this end, we prepro-
cess each trace by first centering around a mean and then multiplying all possible
pairs of time samples within a trace. This means that an m-sample trace is ex-
panded into a
 m
2
 
-sample trace, which results in a substantial increase in the
computational and memory requirements. Then, a leakage test is performed on
the preprocessed traces. To speed up the bivariate analysis, we opted for com-
pressing the traces by a factor of 100.
As in the univariate case, we first carry out a sanity check to verify the
soundness of this approach by performing a bivariate second-order analysis on
the first-order secure implementation. This is expected to leak, and the results
of Figure 4 confirm this. We obtained t-test statistic values within the region of
interest larger than 20, clearly indicating second-order bivariate leakage. These
leakages are close to the diagonal, meaning that leakage occurs by combining
samples from adjacent clock cycles. The leakage is visible with 200k traces.
We repeated the same experiment with the second-order secure implemen-
tation and found no value exceeding our confidence threshold of 4.5 with 1M
traces. This provides some evidence that the second-order implementation indeed
may be secure. However, we feel we cannot provide with a definite answer unless
we exhaustively cover all possible pairs of time samples (without compression),
something that is out of our current computational reach.
Fig. 4. Result of the second-order bivariate analysis on the first-order present S-box
implementation.
5 Conclusions
We implemented a first- and second-order glitch resistant present S-box using
the polynomial masking scheme presented in [22]. We verified these implementa-
tions with both univariate and bivariate attacks and confirmed the claimed SCA
resistance. Our implementations resulted in 3594 GE for the first-order and in
8338 GE for the second-order implementation.
Acknowledgements
This work has been supported in part by the Research Council of KU Leuven
(OT/13/071 and GOA/11/007), by the FWO (g.0550.12) and by the Hercules
foundation (AKUL/11/19). Begu¨l Bilgin was partially supported by the FWO
project G0B4213N. Oscar Reparaz is funded by a PhD fellowship of the Fund
for Scientific Research - Flanders (FWO).
References
1. Nangate open cell library. http://www.nangate.com/.
2. M. Ben-Or, S. Goldwasser, and A. Wigderson. Completeness theorems for non-
cryptographic fault-tolerant distributed computation. In Proceedings of the twenti-
eth annual ACM symposium on Theory of computing, pages 1–10, New York, NY,
USA, 1988. ACM.
3. A. Bogdanov, L. R. Knudsen, G. Leander, C. Paar, A. Poschmann, M. J. B. Rob-
shaw, Y. Seurin, and C. Vikkelsoe. PRESENT: An Ultra-Lightweight Block Cipher.
In P. Paillier and I. Verbauwhede, editors, CHES, volume 4727 of Lecture Notes in
Computer Science, pages 450–466. Springer, 2007.
4. C. Carlet, L. Goubin, E. Prou↵, M. Quisquater, and M. Rivain. Higher-Order
Masking Schemes for S-Boxes. In A. Canteaut, editor, FSE, volume 7549 of Lecture
Notes in Computer Science, pages 366–384. Springer, 2012.
5. S. Chari, C. S. Jutla, J. R. Rao, and P. Rohatgi. Towards sound approaches to
counteract power-analysis attacks. In M. J. Wiener, editor, CRYPTO, volume 1666
of Lecture Notes in Computer Science, pages 398–412. Springer, 1999.
6. J. Cooper, E. DeMulder, G. Goodwill, J. Ja↵e, G. Kenworthy, and P. Rohatgi.
Test Vector Leakage Assessment (TVLA) methodology in practice. Interna-
tional Cryptographic Module Conference, 2013. http://icmc-2013.org/wp/wp-
content/uploads/2013/09/goodwillkenworthtestvector.pdf.
7. Y. Crama and P. L. Hammer. Boolean Models and Methods in Mathematics, Com-
puter Science, and Engineering. Cambridge University Press, New York, NY, USA,
1st edition, 2010.
8. G. Goodwill, B. Jun, J. Ja↵e, and P. Rohatgi. P.: A testing methodology for
side-channel resistance validation, niat, 2011.
9. G. Goodwill, B. Jun, J. Ja↵e, and P. Rohatgi. A testing methodol-
ogy for side channel resistance validation. NIST non-invasive attack test-
ing workshop, 2011. http://csrc.nist.gov/news events/non-invasive-attack-testing-
workshop/papers/08 Goodwill.pdf.
10. L. Goubin and J. Patarin. DES and di↵erential power analysis (the ”duplication”
method). In . K. Ko and C. Paar, editors, CHES, volume 1717 of Lecture Notes in
Computer Science, pages 158–172. Springer, 1999.
11. J. Guo, T. Peyrin, and A. Poschmann. The PHOTON family of lightweight hash
functions. In P. Rogaway, editor, CRYPTO, volume 6841 of Lecture Notes in
Computer Science, pages 222–239. Springer, 2011.
12. J. Guo, T. Peyrin, A. Poschmann, and M. J. B. Robshaw. The LED block cipher.
In B. Preneel and T. Takagi, editors, CHES, volume 6917 of Lecture Notes in
Computer Science, pages 326–341. Springer, 2011.
13. ISO/IEC. ISO/IEC 29192-2. Information technology – Security techniques –
Lightweight cryptography – Part 2: Block ciphers. ISO/IEC, 2012.
14. P. C. Kocher, J. Ja↵e, and B. Jun. Di↵erential power analysis. In Proceedings of
the 19th Annual International Cryptology Conference on Advances in Cryptology,
CRYPTO ’99, pages 388–397, London, UK, UK, 1999. Springer-Verlag.
15. S. Mangard, T. Popp, and B. M. Gammel. Side-channel leakage of masked CMOS
gates. In A. Menezes, editor, CT-RSA, volume 3376 of Lecture Notes in Computer
Science, pages 351–365. Springer, 2005.
16. S. Mangard, N. Pramstaller, and E. Oswald. Successfully attacking masked AES
hardware implementations. In J. R. Rao and B. Sunar, editors, CHES, volume
3659 of Lecture Notes in Computer Science, pages 157–171. Springer, 2005.
17. T. S. Messerges. Using second-order power analysis to attack dpa resistant soft-
ware. In . K. Ko and C. Paar, editors, CHES, volume 1965 of Lecture Notes in
Computer Science, pages 238–251. Springer, 2000.
18. A. Moradi and O. Mischke. On the simplicity of converting leakages from multi-
variate to univariate - (case study of a glitch-resistant masking scheme). In CHES,
pages 1–20, 2013.
19. S. Nikova, C. Rechberger, and V. Rijmen. Threshold implementations against
side-channel attacks and glitches. In P. Ning, S. Qing, and N. Li, editors, ICICS,
volume 4307 of Lecture Notes in Computer Science, pages 529–545. Springer, 2006.
20. A. Poschmann, S. Ling, and H. Wang. 256 bit standardized crypto for 650 GE -
GOST revisited. In S. Mangard and F.-X. Standaert, editors, CHES, volume 6225
of Lecture Notes in Computer Science, pages 219–233. Springer, 2010.
21. Research Center for Information Security, National Institute of Advanced Indus-
trial Science and Technology. Side-channel Attack Standard Evaluation Board
SASEBO-G Specification.
22. T. Roche and E. Prou↵. Higher-order glitch free implementation of the AES using
Secure Multi-Party Computation protocols - Extended version. J. Cryptographic
Engineering, 2(2):111–127, 2012.
23. A. Shamir. How to Share a Secret. Communications of the ACM, 22(11):612–613,
1979.
24. K. Tiri and I. Verbauwhede. A logic level design methodology for a secure DPA
resistant ASIC or FPGA implementation. pages 246–251, 2004.
25. M. Tunstall and O. Benoˆıt. E cient use of random delays in embedded software.
In D. Sauveron, C. Markantonakis, A. Bilas, and J.-J. Quisquater, editors,WISTP,
volume 4462 of Lecture Notes in Computer Science, pages 27–38. Springer, 2007.
26. B. L. Welch. The Generalization of ‘Student’s’ Problem when Several Di↵erent
Population Variances are Involved. Biometrika, 34(1/2):28–35, 1947.
Appendix A: Finite-field Multiplier
The combinatorial finite-field multiplier in F24 used in our implementation is
based on the algebraic normal form. The 4-bit inputs A = (a3, a2, a1, a0) andB =
(b3, b2, b1, b0) result in output C = (c3, c2, c1, c0) by following bitwise operations:
c0 = (a0b0) + (a1b3) + (a2b2) + (a3b1)
c1 = (a0b1) + (a1b0) + (a1b3) + (a2b2) + (a2b3) + (a3b1) + (a3b2)
c2 = (a0b2) + (a1b1) + (a2b0) + (a2b3) + (a3b2) + (a3b3)
c3 = (a0b3) + (a1b2) + (a2b1) + (a3b0) + (a3b3)
where A, B and C are in little-endian notation.
Appendix B: Polynomial Masking Scheme with
(5,2)-sharing
This section lists the equations for the construction of, reconstruction from and
operations on the shares when considering a (5,2)-sharing. In what follows, all
additions and multiplications are in F24 .
First, five distinct non-zero elements in F24 need to be chosen. These are re-
ferred to as the public coe cients ↵1i5. Together with these points, the first
row ( 1, ..., 5) of the inverse Vandermonde matrix (↵
j
i )1i,j5 is needed. These
interpolation coe cients can be calculated as:
 1 = ↵2(↵1 + ↵2)
 1↵3(↵1 + ↵3) 1↵4(↵1 + ↵4) 1↵5(↵1 + ↵5) 1
 2 = ↵1(↵2 + ↵1)
 1↵3(↵2 + ↵3) 1↵4(↵2 + ↵4) 1↵5(↵2 + ↵5) 1
 3 = ↵1(↵3 + ↵1)
 1↵2(↵3 + ↵2) 1↵4(↵3 + ↵4) 1↵5(↵3 + ↵5) 1
 4 = ↵1(↵4 + ↵1)
 1↵2(↵4 + ↵2) 1↵3(↵4 + ↵3) 1↵5(↵4 + ↵5) 1
 5 = ↵1(↵5 + ↵1)
 1↵2(↵5 + ↵2) 1↵3(↵5 + ↵3) 1↵4(↵5 + ↵4) 1
Here, the multiplicative inverse in our field is represented by . 1. Elements
↵1i5 and  1i5 are publicly available to all five players.
Sharing a value X requires two secret and random coe cients a1, a2 and the
public coe cients ↵1i5. The resulting shares X1i5 are calculated as:
Xi = X + (a1↵i) + (a2↵
2
i ), with 1  i  5
Each player receives exactly one share Xi and has no access to any other share.
Reconstruction of the secret value X requires the interpolation coe cients
 1i5:
X = (X1 1) + (X2 2) + (X3 3) + (X4 4) + (X5 5)
To describe the operations, a constant value will be represented as c and two
secret values as X and Y . Their (5,2)-sharings are given by X1i5 and Y1i5.
Both are masked with the same public coe cients but use independent random
secret coe cients a1, a2 and b1, b2.
Addition with a constant can be achieved by each player independently as:
Zi = Xi + c
= (X + (a1↵i) + (a2↵
2
i )) + c
= (X + c) + (a1↵i) + (a2↵
2
i ), with 1  i  5
The resulting shares of the addition represent the correct new secret Z = X + c.
Multiplication with a constant is performed in a similar way and can again
be achieved by each player independently:
Zi = Xic
= (X + (a1↵i) + (a2↵
2
i ))c
= (Xc) + (a1c↵i) + (a2c↵
2
i ), with 1  i  5
Considering (a1 c) and (a2 c) as the new coe cients of the second-order poly-
nomial, the shares Z1i5 represent the desired output Z = Xc. Note that the
reconstruction of the masked secret variable does not depend on the polynomial
coe cients a1, a2, but on the interpolation coe cients  1i5, which only de-
pend on the public coe cients ↵1i5.
Addition of two shared secrets is executed in following way:
Zi = Xi + Yi
= (X + (a1↵i) + (a2↵
2
i )) + (Y + (b1↵i) + (b2↵
2
i ))
= (X + Y ) + (a1 + b1)↵i) + (a2 + b2)↵
2
i ), with 1  i  5
With a1 +b1 and a2 +b2 as the new polynomial coe cients, the resulting shares
mask the desired new secret variable Z = X + Y .
Multiplication of two shared secrets consists of the following three steps:
1. Each player i first computes ti
ti = XiYi
= (XY ) + (a1Y + b1X)↵i + (a1b1 + a2Y + b2X)↵
2
i
+ (a1b2 + b1a2)↵
3
i + (a2b2)↵
4
i , with 1  i  5
2. Each player i then randomly selects two coe cients ai,1, ai,2 and remasks ti:
qi,1 = ti + (ai,1↵1) + (ai,2↵
2
1)
qi,2 = ti + (ai,1↵2) + (ai,2↵
2
2)
qi,3 = ti + (ai,1↵3) + (ai,2↵
2
3)
qi,4 = ti + (ai,1↵4) + (ai,2↵
2
4)
qi,5 = ti + (ai,1↵5) + (ai,2↵
2
5)
Each qi,8j 6=i is subsequently send to the corresponding player j.
3. The outputs q1,i, q2,i, q3,i of each player i are then distributed and recon-
structed as
Zi = (q1,i 1) + (q2,i 2) + (q3,i 3) + (q4,i 4) + (q5,i 5)
This sequence of operations gives the shares corresponding to the correct masked
result Z = XY in a secure way.
Square of a shared secret can only be computed in the straightforward way,
i.e., as Z = X2 or
Zi = Z
2
i = X
2 + (a21↵
2
i ) + (a
2
2(↵
2
i )
2), with 1  i  5
when ↵1i5 satisfy the conditions for frobenious stability. This means that for
every ↵i, there exists an ↵j such that ↵j = ↵2i . A reordering between every
player i and player j satisfying ↵j = ↵2i is then required to keep the correct
public coe cient linked to its player. When this reordering is not performed, the
reconstruction of the correct masked secret Z = X2 is not possible.
Appendix C: Second-order Hardware Architecture











  

























     

 

 

 









 
   



  



 


 








 








 








 












 
 
 
  

 

 





 
  

 








 















 

 






 
 

 

 





 

 







 
 

 

 





 

 







 
 

 

 





 

 







 
 

 

 





 

 










































































       
     
         
     
         
     
         
     
         
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Fig. 5. Architecture diagram for the second-order present implementation.
Appendix D: Area requirements for the first-order and
second-order present S-box implementations
Table 3. Area in GE of the first-order and second-order present S-box implementa-
tions.
Component
Area (GE)
first-order second-order
multiplier 47 47
mult el1 233 639
mult el2 148 252
shared mult 1360 4969
add acc el 127 127
add acc 379 630
ctrl el 120 120
ctrl 352 592
Control unit 1503 2147
S-box 3594 8338
Appendix E: Welch’s t-Test
An easy way to test for potential side-channel leakages, which might lead to a
successful attack in a cryptographic system, is proposed by Goodwill et al. [8].
Due to its independence of a leakage model. this method is a convenient way
to test whether or not the implementation of the device e↵ectively counteracts
SCA attacks. Although no single test can guarantee the revelation of all vulner-
abilities against all possible SCA attacks, this test is designed to be sensitive
enough to cover a wide range of potential problems. After acquisition of a suf-
ficient amount of power traces, the traces are divided in two sets, A and B,
based on an intermediate value in the computation. The problem of assessing
whether there is potentially exploitable leakage or not is then formulated as an
hypothesis test. The null hypothesis corresponds to the statement ”the mean
power curves of A and B are data-independent”. The statistical test is Welch’s
t-test, a generalization of the Student’s t-test allowing samples to have unequal
variances [26]. For the first statistical moment, the t-test statistic is calculated
as:
t =
Ta   Tbq
s2a
Na
+
s2b
Nb
where Ti, s2i , Ni are the sample mean, sample variance and sample size of the
set Ti2a,b. This formula can easily be extended to higher statistical moments.
The t-test statistic is computed point-wise on the di↵erent sets of power
traces. If no point exceeds a certain confidence threshold ±C, then the null hy-
pothesis holds, indicating that there is no relation between the processed inter-
mediate value and the instantaneous power consumption. In case the threshold
is crossed, another t-test is performed on an independent set of traces. When
the t-test statistic exceeds ±C at the same points in time, the null hypothesis
can be rejected with a significance level related to C. In that case, the alternate
hypothesis holds, indicating that the power consumption and the intermediate
values are related in a statistically significant way, making the device potentially
vulnerable to SCA attacks.
Figure 6 shows the resulting t-test statistic in case the alternate hypothesis
hold.
0 0.5 1 1.5 2 2.5 3
x 104
⌦20
0
20
40
Samples
t⌦
te
st
 s
ta
tis
tic
 
Fig. 6. Result of the t-test for the first-order present implementation with biased
masks.
