Differential Power Analysis In-Practice for Hardware Implementations of the Keccak Sponge Function by Graff, Nathaniel
DIFFERENTIAL POWER ANALYSIS RESISTANCE IN-PRACTICE
FOR HARDWARE IMPLEMENTATIONS OF
THE KECCAK SPONGE FUNCTION
A Thesis
presented to
the Faculty of California Polytechnic State University,
San Luis Obispo
In Partial Fulfillment
of the Requirements for the Degree
Master of Science in Electrical Engineering
by
Nathaniel Graff
June 2018
c© 2018
Nathaniel Graff
ALL RIGHTS RESERVED
ii
COMMITTEE MEMBERSHIP
TITLE: Differential Power Analysis Resistance In-
Practice for Hardware Implementations of
the Keccak Sponge Function
AUTHOR: Nathaniel Graff
DATE SUBMITTED: June 2018
COMMITTEE CHAIR: Andrew Danowitz, Ph.D.
Professor of Electrical and Computer Engineering
COMMITTEE MEMBER: Bruce DeBruhl, Ph.D.
Professor of Computer Engineering and Computer Science
COMMITTEE MEMBER: Joseph Callenes-Sloan, Ph.D.
Professor of Electrical and Computer Engineering
iii
ABSTRACT
Differential Power Analysis Resistance In-Practice for Hardware Implementations of
the Keccak Sponge Function
Nathaniel Graff
The Keccak Sponge Function is the winner of the National Institute of Standards
and Technology (NIST) competition to develop the Secure Hash Algorithm-3 Stan-
dard (SHA-3). Prior work has developed reference implementations of the algorithm
and described the structures necessary to harden the algorithm against power anal-
ysis attacks which can weaken the cryptographic properties of the hash algorithm.
This work demonstrates the architectural changes to the reference implementation
necessary to achieve the theoretical side channel-resistant structures, compare their
efficiency and performance characteristics after synthesis and place-and-route when
implementing them on Field Programmable Gate Arrays (FPGAs), publish the re-
sulting implementations under the Massachusetts Institute of Technology (MIT) open
source license, and show that the resulting implementations demonstrably harden the
sponge function against power analysis attacks.
iv
ACKNOWLEDGMENTS
Thanks to:
• Dr. Andrew Danowitz for advising me on this project for the last three years
• My parents, Janet and Michael Graff, for all their love and support
• The White Hat Club and all of my friends who inspired my interest in security
v
TABLE OF CONTENTS
Page
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
CHAPTER
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Theory of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 The Keccak Sponge Function . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Power Analysis Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Vulnerability of Keccak to Power Analysis Attacks . . . . . . . . . . . 5
2.4 Additivity and Secret Sharing . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Threshold Three-Share Implementation of Keccak . . . . . . . . . . . 7
2.6 Preserving Uniformity . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Implementation In-Practice . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Reference Implementation . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Threshold Implementation . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Achieving Uniformity . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Test Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Efficiency and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1 Synthesis Results for FPGA . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Analysis of Synthesis Results . . . . . . . . . . . . . . . . . . . . . . . 17
5 Validation of Power Analysis Attack Resistance . . . . . . . . . . . . . . . 20
5.1 Testing Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.1.1 DUT Set-Up and Power Trace Measurement . . . . . . . . . . 20
5.1.2 DUT Input Selection and Partitioning . . . . . . . . . . . . . 25
5.1.3 T-Statistic Calculation and Analysis . . . . . . . . . . . . . . 26
5.2 Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
APPENDICES
vi
A Code Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
A.1 Threshold Round Permutation Module . . . . . . . . . . . . . . . . . 34
A.2 Uniform Chi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.3 Power Trace Capture Script . . . . . . . . . . . . . . . . . . . . . . . 40
vii
LIST OF TABLES
Table Page
4.1 Optimization Strategies . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 T-Statistic Maximum Excursions . . . . . . . . . . . . . . . . . . . 28
viii
LIST OF FIGURES
Figure Page
2.1 Sponge Function Construction of a Hash Function [14] . . . . . . . 4
3.1 Interconnection of modified single-share blocks . . . . . . . . . . . . 13
3.2 Interconnection of uniformity-preserving implementation . . . . . . 14
4.1 Comparing chip area utilization for different implementations and
optimization strategies of the Keccak Sponge Function . . . . . . . 17
4.2 Comparing on-chip power consumption for different implementations
and optimization strategies of the Keccak Sponge Function . . . . . 18
5.1 Underside of the Nexys 4 DDR after capacitor removal . . . . . . . 22
5.2 Experimental set-up for power trace collection . . . . . . . . . . . . 26
5.3 T-statistics over 5000 collected power traces . . . . . . . . . . . . . 28
ix
Chapter 1
INTRODUCTION
Sponge functions are a recently popularized class of algorithm with many applica-
tions to hash function and cipher construction [8]. Keccak-f[b] is a family of sponge
functions created by Guido Bertoni, Joan Daemen, Michae¨l Peeters, and Gilles Van
Assche [9], and its largest permutation, Keccak-f[1600], is the winner of the NIST
SHA-3 competition to develop the Secure Hash Algorithm-3 (SHA-3) Standard [14].
The design of the Keccak Sponge Function function departs from prior hash function
standards like MD5, SHA-1, and SHA-2 through its sponge function construction
[9]. This construction also allows for the hash function to implement new modes like
extendable output [14].
The Keccak sponge function has been subjected to significant scrutiny to verify
or disprove the algorithm’s suitability as a cryptographically-secure hash function
[7, 6, 19]. This thesis does not seek to analyze the algorithm’s information-theoretic
security. However, a major concern for the design of systems implementing crypto-
graphic computation is the threat of side-channel attacks (SCAs) [21]. This thesis
chiefly concerns itself with Keccak’s vulnerability to power-channel SCAs, wherein the
power consumption of the algorithm can be correlated with input to the algorithm,
breaking certain guarantees of cryptographic security [21].
Bertoni et. al. have published reference implementations of the algorithm in the
public domain and described techniques to harden the algorithm against power anal-
ysis attacks [11, 10, 12]. However, no implementation of these hardening techniques
have been made public, and published analysis of the efficacy of these techniques has
been limited to software simulation.
1
This work starts by describing the published techniques necessary to harden a
Keccak Sponge Function hardware accelerator implementation against power analysis
attack. I then describe how an unprotected hardware implementation of the Keccak
Sponge Function is modified to implement these power-channel hardening techniques.
Using a commercial Xilinx FPGA-based test platform [3], I demonstrate that the
modified algorithm yields the correct result, measure the design resource consumption
of the modifications, and enable the collection of power traces. Finally, I employ a
validation test procedure published by Rambus [15] to show that the power-channel
hardening techniques decrease the correlation between input data and power channel
by an order of magnitude after implementation on a commercial Xilinx FPGA.
2
Chapter 2
THEORY OF OPERATION
2.1 The Keccak Sponge Function
The Keccak Sponge Function is the primitive used in the construction of the SHA-
3 cryptographically-secure hash algorithm (CSHA) family. Cryptographically-secure
hash functions are defined as functions which accept a variable-length input string
and produce a fixed-length output string such that the input string can not be easily
computed given the output string [20]. Additionally, two input strings cannot easily
be found which produce the same output string. The SHA-3 family of CSHAs also
specifies a class of function called extendable-output functions (XOFs) which preserve
the properties of a CSHA except that the output string of the XOF can be dynamically
extended to an arbitrary length.
The versions of the Keccak-f[b] Sponge Function are notated Keccak-f[r+c], where
r is referred to as the rate and c is referred to as the capacity. The rate is the
number of bits processed or output per invocation of the permutation function [14].
b represents the number of bits in the internal state matrix of the sponge function,
and the rate represents the number of bits absorbed into or squeezed out of the state
matrix between permutation operations. For any rate and capacity, Keccak-f[1600]
operates over a 5-by-5-by-64-bit (1600-bit) internal state matrix referred to as the
sponge. When indexing the bits of the state matrix, the short axes are indexed as a
“row” or “column” and the long axis is indexed as a “lane” [14].
During computation, the sponge function undergoes three steps: absorption, per-
mutation, and squeezing, as shown in Figure 2.1. The input to the function is first
padded using the pad10*1 padding scheme and then broken into rate-sized blocks
3
[14]. Each block is then “absorbed” into the sponge in turn by exclusive or-ing it
into the sponge, and the sponge permutation function is applied to the sponge after
every absorption [14]. After the last block is absorbed and permuted, the hash result
is “squeezed” out of the sponge in rate-sized blocks, with a permutation step again
in between each squeezing [14].
Figure 2.1: Sponge Function Construction of a Hash Function [14]
The sponge permutation function consists of 24 rounds. Each round of the per-
mutation consists of five steps named θ, ρ, pi, χ, and ι (theta, rho, pi, chi, and iota)
[14]. Each step takes a state matrix as input and produces a state matrix as out-
put. Of these, this work primarily concerns itself with χ, because it is the only step
which must be greatly modified to build a power analysis-resistant implementation
of Keccak [10].
This work will focus on the Keccak-f[r=1024,c=576] version, which is chosen as the
permutation in use for arbitrary-length output and is the sponge function primitive
used in the SHAKE128 extendable output function (XOF) [14] which behaves like
a traditional hash function but is capable of producing an arbitrary length output.
This permutation was chosen exclusively because the published reference hardware
implementation of the hash function implements the same rate and capacity.
4
2.2 Power Analysis Attacks
Power analysis attacks are a subclass of Side Channel Attack (SCA) in which the at-
tacker has knowledge of the power consumption of the device under attack [21]. When
cryptographic algorithms are implemented na¨ıvely, variations in power consumption
between operations can leak information. A subclass of these attacks, differential
power analysis, can be used to attack algorithms which run repeatedly with a con-
stant secret input or intermediate state by extracting the secret from thousands of
runs for which those values remained constant [17].
2.3 Vulnerability of Keccak to Power Analysis Attacks
Power analysis attacks “do not exploit an inherent weakness of an algorithm,” but
rather characteristics of their implementation [10]. Prior work by Bertoni et. al.
has shown that an unprotected implementation of the Keccak Sponge Function the-
oretically demonstrates distinguishability of power channel trace distributions with
respect to input data [10]. Bertoni et. al. go on to show that a secret sharing algo-
rithm can reduce and remove this distinguishability, and the algorithm they describe
is the protection technique tested by this thesis.
2.4 Additivity and Secret Sharing
The hardening techniques to construct a power analysis-resistant implementation
of the Keccak Sponge Function make use of a number of mathematical properties.
First among these is the property of additivity (Definition 2.4.1), which is used to
implement a technique called “secret sharing”.
5
Definition 2.4.1. Property of Additivity [13]
Given a linear function H and two values A and B in the domain of H:
H(A+B) = H(A) +H(B)
The Keccak Sponge Function is linear for all operations except the permutation
step χ [10]. If it is possible to create an implementation of Keccak which satisfies the
properties of linearity, referred to hereafter as K ′, then Proposition 2.4.1 will hold.
Proposition 2.4.1. Application of a Linear Sponge Function K ′ [10]
Given a message M and a random bitstream N of length(M):
Keccak(M) = K ′(M ⊕N)⊕K ′(N)
The ⊕ symbol represents bitwise exclusive or.
Definition 2.4.1 can be repeatedly applied to extend Proposition 2.4.1 to an arbi-
trary number of random bitstreams. In order to protect the algorithm against power
analysis attacks, the computation of any one output share must be independent of
at least one input share [10]. Bertoni et. al. showed that hardware implementations
of Keccak require three shares (generating using two random bitstreams) to provide
resistance to power analysis attacks [10]. The arguments to each instance of K ′ are
then defined as “shares” as in Definition 2.4.2.
Definition 2.4.2. Input Shares [10]
Given a message M and two random bitstreams of length(M) N1 and N2:
A = M ⊕N1 ⊕N2
B = N1
C = N2
6
The sponge state for each share will be notated a, b, and c, corresponding to
that share. The resulting message hash Keccak(M) can be calculated by taking
the bitwise exclusive or K ′(A)⊕K ′(B)⊕K ′(C). The power analysis attack-resistant
sponge function created by this 3-share implementation is referred to as the Threshold
Three-Share Implementation.
2.5 Threshold Three-Share Implementation of Keccak
Of the steps that make up the Keccak-f permutation function, χ is the only step which
does not satisfy the property of additivity. To remedy this, Bertoni et. al. proposed
a replacement for χ called χ′ which is logically equivalent to χ when computed over
the full set of sponge states {a, b, c} [10].
In the unprotected, single-share implementation of Keccak, χ is defined by Bertoni
et. al. [10] as Definition 2.5.1.
Definition 2.5.1. Single-Share Implementation of χ [10]
Given sponge state a and row index x ∈ [0 . . . 4]:
aχ out = χ(a) = ax ⊕ (ax+1 ⊕ 1)ax+2
This operation is modified by Bertoni et. al. [10] to result in χ′, shown in Defini-
tion 2.5.2.
Definition 2.5.2. Threshold Implementation of χ′ [10]
Given sponge states {a, b, c} and row index x ∈ [0 . . . 4]:
aχ out = χ
′(b, c) = bx ⊕ (bx+1 ⊕ 1)bx+2 ⊕ bx+1cx+2 ⊕ bx+2cx+1
bχ out = χ
′(c, a) = cx ⊕ (cx+1 ⊕ 1)cx+2 ⊕ cx+1ax+2 ⊕ cx+2ax+1
cχ out = χ
′(a, b) = ax ⊕ (ax+1 ⊕ 1)ax+2 ⊕ ax+1bx+2 ⊕ ax+2bx+1
7
When χ′ is substituted for χ in a three-share linearization of the Keccak sponge
function, there is only one additional modification necessary to make the resulting
computation result identical to a single-share non-linearized Keccak implementation.
The ι step must be applied to only one of the three shares of the sponge function [10].
Definition 2.5.3. Three-Share Implementation of ι [10]
Given sponge states {a, b, c} and without loss of generality:
aι out = ι(a)
bι out = b
cι out = c
2.6 Preserving Uniformity
Bilgin et. al. have shown that the threshold implementation of χ′ is not sufficient for
securing the Keccak Sponge Function against first-order differential power analysis
[12]. They state that for a function f to be resistant to first-order DPA, it must be
both non-complete, and uniform. Here, non-complete states that the output of f must
be independent of at least one input share. This property is true of the threshold χ′,
as we see in Definition 2.5.2, where anext = χ
′(b, c). However, the threshold χ′ does
not preserve a uniform random distribution of input shares over the 24 rounds of the
Keccak permutation function because it is not invertible (the step is not one-to-one,
but takes multiple inputs to the same output) [12].
Bilgin et. al. preserve uniformity across the three shares through the injection of
additional randomness during the χ step [12]. The scheme followed by this work is
the one proposed in [12], where P and S are each 2-bit random vectors unique to each
round of the permutation. χ′ is as defined in Definition 2.5.2 and Bertoni et. al.’s
prior work [10].
8
A uniformity-preserving implementation of χ′ is shown in Definitions 2.6.1, 2.6.2,
and 2.6.3 as defined by Bilgin et. al. [12].
Definition 2.6.1. Uniform χ′ for row x ∈ [0 . . . 2] and column y ∈ [0 . . . 4] [12]
Given the threshold implementation of χ′ as in Definition 2.5.2:
aχ out = χ
′(b, c)
bχ out = χ
′(c, a)
cχ out = χ
′(a, b)
Definition 2.6.2. Uniform χ′ for row x ∈ [3 . . . 4] and column y = 0 [12]
Given the threshold implementation of χ′ as in Definition 2.5.2:
aχ out = χ
′(b, c)⊕ Px−2 ⊕ Sx−2
bχ out = χ
′(c, a)⊕ Px−2
cχ out = χ
′(a, b)⊕ Sx−2
Definition 2.6.3. Uniform χ′ for row x ∈ [3 . . . 4] and column y ∈ [1 . . . 4] [12]
Given the threshold implementation of χ′ as in Definition 2.5.2:
aχ out = χ
′(b, c)⊕ ax,y−1 ⊕ bx,y−1
bχ out = χ
′(c, a)⊕ ax,y−1
cχ out = χ
′(a, b)⊕ bx,y−1
When this uniform implementation is substituted for χ in a three-share implemen-
tation of Keccak and ι is only applied to a single share, the result is again identical
to the single-share, non-linearized implementation and the implementation is referred
to as the Uniformity-Preserving Three-Share Implementation.
9
Chapter 3
IMPLEMENTATION IN-PRACTICE
The theory of protecting the Keccak Sponge Function against power analysis attack
has been discussed and simulated by prior work [10, 12]. This work extends the
analysis of these techniques to provide an independent verification of the correctness
and resistance of the resulting algorithm. To accomplish this, a synthesizable VHDL
implementation of the modified algorithm was created, starting with a public domain
reference implementation of the algorithm.
3.1 Reference Implementation
With the theory of protecting the Keccak Sponge Function from power analysis attack
established, the task of implementing and validating the behavior of the protection
techniques can be discussed. This work bases the development of a protected hardware
implementation on version 3.1 of the VHDL reference implementation published by
Bertoni et. al. [11].
The published reference implementation implements the Keccak-f[r=1024,c=575]
permutation, the sponge function used in the SHAKE128 XOF [14]. The reference
offers the choice of a few different hardware accelerator designs. For this work, the
“high speed core” design was chosen because it was suited to application of the pro-
tection techniques with minimal modifications to the structure of the implementation.
The other design choices consume less chip area and power by serializing computa-
tion. For validating the protection techniques, optimizing for minimal chip area was
not the highest priority. The FPGA test platform is not significantly constrained
by design area or power consumption, and the modifications to the χ step are made
10
simpler by choosing the design with the least control signal overhead.
The reference implementation omits two steps of the full SHAKE128 XOF al-
gorithm: input padding and output truncation. The input to the device must be
delivered pre-padded using the pad10*1 scheme [14] in exactly rate-sized chunks, and
the output of the function is delivered in exactly rate-sized chunks such that any out-
put truncation must be performed by the consumer of the hardware implementation.
For this work, the padding scheme was applied during input vector generation and
a single 1024-bit output was taken as output, eliminating the need for the omitted
steps in hardware.
The high speed core contains three main components: a finite state machine for
driving the computation, a block which implements a single round of the permutation
function, and a buffer of the sponge state.
3.2 Threshold Implementation
The Threshold Implementation of the Keccak Sponge Function is resistant to sim-
ple power analysis attacks (see Section 2.2 for a discussion of power analysis attack
types). The first step of creating the threshold implementation is to implement the
secret-sharing algorithm (Definition 2.4.2). A wrapper module, keccak three share,
manages the random bit mixing in the secret sharing algorithm, the synchronization
of control signals, and the configuration of the separate Keccak algorithm shares.
The wrapper module provides the same interface as the original unprotected Keccak
module, allowing for drop-in replacement of the unprotected algorithm.
11
Code. Three-Share Secret Sharing in VHDL
−− Input Share Computation
sha r e 1 <= rand 1 ;
sha r e 2 <= rand 2 ;
sha r e 3 <= ( rand 1 xor rand 2 xor din ) ;
−− Output Share Recombination
dout <= ( sha r e 1 ou t xor s h a r e 2 ou t xor s h a r e 3 ou t ) ;
For the purposes of test framework development, the “random” bitstream gener-
ation is performed by a 64-bit constant-seeded linear feedback shift register (LFSR).
Such a generator is insufficient for resisting power analysis attack because the out-
put is deterministic. However, this choice has a number of advantages for testing
and validation. The simplicity of the LFSR allows for it to be instantiated multiple
times with minimal effect on the consumption of FPGA area or power, resulting in
efficiency and performance data which is minimally affected by the choice of random
number generator. Also, though the bitstream is deterministic, the input vectors
generated for the validation step of this work were created using random bytes from
/dev/urandom on Unix. The result is that the deterministic bitstream generated by
the LFSR is uncorrelated with the input bitstream, so our validation methodology
followed is not affected by the choice of pseudo-random number generator (PRNG).
Within the secret sharing algorithm, three copies of the single share permutation
function are instantiated and modified. For each share of the threshold implementa-
tion, χ′ is a function of the output of the pi step of the other two shares (Definition
2.5.2). Therefore, the instantiated single-share block was modified to output the re-
sult of the pi step step from its permutation function and accept the input to the χ
step of the other two shares. The single-share block was also modified with a boolean
12
iota-enable to allow the wrapper to selectively enable the ι step in only one instance
of the algorithm as in Definition 2.5.3. The three instances were then interconnected
as shown in Figure 3.1.
Modiﬁed
Single Share
A 
Modiﬁed
Single Share
C 
Modiﬁed
Single Share
B 
  out a
  out c
  out b  in c 
  in b
  in a 
  in c 
  in a 
  in b 
Figure 3.1: Interconnection of modified single-share blocks
The modified VHDL threshold implementation of the Keccak permutation round
module can be seen in-full in the Appendix section A.1.
3.3 Achieving Uniformity
Definition 2.6.3 demonstrates that the uniform χ′ is not symmetric with respect to
the input shares because bits from shares A and B are re-injected to preserve unifor-
mity. The uniform χ step is placed as its own submodule in the three-share wrapper,
resulting is an interconnection scheme as shown in Figure 3.2.
The VHDL implementation of the three-share uniform χ′ can be seen in-full in
13
Modiﬁed
Single Share
A 
Modiﬁed
Single Share
C 
Modiﬁed
Single Share
B 
  out a
  out c
  out b
  in a 
  in b 
  in c 
Uniform 
 
 ' 
  in a 
  in b
  in c 
  out a 
  out b
  out c 
Figure 3.2: Interconnection of uniformity-preserving implementation
the Appendix section A.2.
3.4 Test Framework
The reference implementation of the Keccak Sponge Function is a hardware acceler-
ator and must be driven by a top-level block which handles the accelerator’s control
signals, inputs, and outputs. These tasks were accomplished by the development of
a top-level block which incorporated the following elements:
• A clock frequency resampler to adjust the clock frequency, allowing the design
to meet power trace measurement bandwidth requirements
14
• A block memory for storing input test vectors
• A USB UART so that the device could be controlled by a PC over USB.
The resulting platform was synthesized using Xilinx Vivado WebPACK 2015.2 [5]
for a Digilent Nexys 4 DDR FPGA Development Board [3], featuring a Xilinx Artix-7
FPGA [1].
15
Chapter 4
EFFICIENCY AND PERFORMANCE
4.1 Synthesis Results for FPGA
The test platform was synthesized for a Digilent Nexys 4 DDR FPGA Development
Board, featuring a Xilinx Artix-7 FPGA using Xilinx Vivado 2015.2. To estimate
the FPGA resource consumption of the designs, measurements of the area and power
characteristics were collected from the Vivado design report after bitstream genera-
tion. The unprotected single-share, threshold three-share, and uniform three-share
implementations were synthesized using various optimization settings. The optimiza-
tion strategies for each design are referred to in shorthand as described in Table 4.1.
Utilization of the Artix-7 FPGA is measured in flip-flop and look-up table slices as
reported by Vivado after the Implementation step. Figure 4.1 displays the utilization
of each implementation, normalized to the total number of slices used by the single-
share defaults-optimized implementation.
Power consumption is as reported by Vivado after the Implementation step. Sim-
ilarly as in Figure 4.1, the reported values in Figure 4.2 are normalized to the on-chip
power consumption of the single-share defaults-optimized implementation.
Table 4.1: Optimization Strategies
Optimization Strategy Shorthand Synthesis Setting Implementation Setting
Defaults Defaults Defaults
Optimize Area Flow AreaOptimized High Area Explore
Optimize Performance Flow PerfOptimized High Performance Explore
Optimize Power Flow AreaOptimized High Power DefaultOpt
16
Single Share Threshold 3-Share Uniform 3-Share
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
No
rm
al
ize
d 
Ch
ip
 A
re
a
Post-Place and Route Test Fixture Artix-7 Utilization
Defaults
Optimize Area
Optimize Performance
Optimize Power
Figure 4.1: Comparing chip area utilization for different implementations
and optimization strategies of the Keccak Sponge Function
4.2 Analysis of Synthesis Results
Though the techniques described to implement the three-share logic represent three
parallel implementations of the Keccak sponge function, the design only consumes
approximately double the chip area of the baseline implementation. However, the
three-share design does consume up to three times the power of the single-share
algorithm.
Important to note is that the data presented here does not fully represent the
additional resource consumption required for the implementation of a protected im-
plementation of Keccak. The additional demand of high-entropy random bitstream
generation will increase the demand on area and power. For this work, the random
number generation is approximated by a LFSR as discussed in Section 3.2. For the
17
Single Share Threshold 3-Share Uniform 3-Share
0.0
0.5
1.0
1.5
2.0
2.5
3.0
No
rm
al
ize
d 
Po
we
r C
on
su
m
pt
io
n
Post-Place and Route Test Fixture Artix-7 Power Consumption
Defaults
Optimize Area
Optimize Performance
Optimize Power
Figure 4.2: Comparing on-chip power consumption for different imple-
mentations and optimization strategies of the Keccak Sponge Function
protection mechanisms to withstand an actuall attack, the LFSR must be replaced
by a CSPRNG, because if the random bitstream can be predicted by an attacker then
then input masking is no longer effective [10].
Because the power consumption roughly triples while the area roughly doubles,
it’s more likely that the inclusion of a protected hardware implementation will be
constrained by the available power supply or by chip heat dissipation before chip
area is exhausted. Regardless, the architecture presented in this paper is structured
specifically for throughput, not for efficiency, and the data presented here shows
that the cost of protecting a high-speed hardware accelerator for the Keccak Sponge
Function against power analysis attack is considerable.
Future work on implementation of the protected Keccak Sponge Function has
room to expand in this domain. Work by Bertoni et. al. does describe a serial ar-
18
chitecture instead of the parallel design implemented in this work [10]. This would
potentially allow for a single instance of the modified single-share algorithm to com-
pute the result without leaking information through the power channel at the cost of
taking three times as many cycles to complete the computation.
19
Chapter 5
VALIDATION OF POWER ANALYSIS ATTACK RESISTANCE
5.1 Testing Methodology
To validate the techniques to harden the Keccak sponge function against power anal-
ysis attacks, I used the testing methodology described by Goodwill et. al [15]. In
this section, I’ll provide an overview of how I adapted the methodology to test my
implementation of the Keccak Sponge Function.
The original methodology seeks to provide a pass/fail criterion for determining
whether a device under test (DUT) exhibits a maximum allowed correlation between
intermediate state during computation and fluctuations in the DUT’s power con-
sumption. This threshold, if exceeded, indicates that the DUT is vulnerable to power
analysis attack. There are three steps to the validation methodology:
1. DUT power trace measurement
2. DUT input selection and partitioning
3. T-statistic calculation and analysis
5.1.1 DUT Set-Up and Power Trace Measurement
The unprotected, protected three-share, and protected three-share uniformity-preserving
implementations of the Keccak Sponge Function were synthesized for the Xilinx Artix-
7 FPGA on the Digilent Nexys 4 DDR FPGA Development Board as discussed in
Chapter 4. This platform was chosen because of the availability and low cost of proto-
typing digital logic on FPGA platforms and because the development board supplied
20
a USB UART for communication with a host computer for driving the validation test
data collection.
In the original methodology, the power trace is measured differentially across
a shunt resistor in series with the device under test. The Digilent Nexys 4 DDR
Development Board was powered over the USB connection with the host computer,
and the FPGA, behaving as the DUT, was powered by a collection of LDO DC-DC
power regulators on the development board.
The design of the development board did not readily facilitate the addition of a
series shunt resistor to allow for the methodology-suggested power trace measurement.
However, the development board did provide a broken-out test point for the FPGA’s
primary 3.3V supply voltage (VCCO) at pad J11. I measured the voltage relative
to ground at test point J11. The collected power traces from this test measure the
voltage drop across the DUT in series with the LDO power regulator.
To maximize the voltage fluctuation relative to the power consumption of the
FPGA at test point J11, I used a hot air rework station to remove all filter caps with
nominal value equal to or greater than 1µF connected across VCCO and ground. On
the Digilent Nexys 4 DDR, this included the following capacitors [4]:
• C86, C87, C88
• C98, C99, C100
• C122, C123, C124
• C127
• C147
• C180
21
The capacitors are located on the underside of the Digilent Nexys 4 DDR. A
picture of the board after capacitor removal is included in Figure 5.1.
Figure 5.1: Underside of the Nexys 4 DDR after capacitor removal
The original methodology specifies that the power traces be measured by an os-
cilloscope or other A/D measurement apparatus with the following properties [15]:
1. Bandwidth of at least 50% of the device clock rate for software implementations
and at least 80% of the clock rate for hardware implementations
2. Capability to capture samples at 5x the bandwidth
3. A minimum of 8bits of sampling resolution
4. Enough storage to capture the entire signal required for the test and analysis
22
To meet these requirements I measured the power traces using a Keysight Infini-
iVision MSO-X 2022A Mixed Signal Oscilloscope with a 200 MHz maximum band-
width and 2 GSa/s maximum sample rate [2]. Channel 1 was connected to test point
J11 and the external trigger was connected to PMOD JA Pin 1 on the development
board. The native clock speed of the Digilent Nexys 4 Development Board is 100
MHz. To increase the fidelity of the measured power traces relative to the sponge
function clock rate, the Xilinx Artix-7 FPGA was internally clocked down to 10 MHz
using the Xilinx Clock Wizard IP [23].
A finite state machine encapsulated the Keccak Sponge Function on the FPGA
and coordinated communication with the host computer over the USB UART inter-
face. The DUT emits a pulse on a GPIO output to provide a trigger signal to the
oscilloscope, resulting in synchronized power traces. The following algorithm rep-
resents the power trace collection procedure in pseudocode. The complete Python
script for driving the trace collection can be found in Appendix A.3.
23
Result: A set of power traces
openFpga();
openScope();
openDatabase();
configureOscilloscope();
numTraces = 0;
while numTraces < desiredNumberOfTraces do
if !fixedInput then
input = getRandomBytes(126);
else
input = SELECTED FIXED INPUT;
end
paddedInput = pad101(input);
expectedOutput = SHAKE128(input);
startScopeCapture();
fpgaOutput = sendReceive(paddedInput);
trace = getScopeTrace();
if expectedOutput == fpgaOutput then
writeToDatabase(input, trace);
numTraces += 1;
end
end
Algorithm 1: Power Trace Collection Procedure
To facilitate synchronization and alignment of power traces, the finite state ma-
chine outputs a trigger edge at the beginning and end of the sponge function permu-
tation on PMOD JA Pin 1.
24
5.1.2 DUT Input Selection and Partitioning
In the testing methodology, the input to the DUT is partitioned into two data sets.
Data set A consists of the power traces collected for at least 5000 randomly generated
inputs, and data set B consists of the power traces for at least 5000 runs of a single,
fixed input. For the fixed-input I selected, at random, the value (in hex):
5c2c 43fe c6a3 87d8 763b 79af 7ca2 d038
441b ac29 5074 9df2 3a4c 1ee6 7ccb a9a7
0019 5a70 864a 557f cc82 9bde 0762 3218
946f 243f b96f 9478 d840 689a 8462 12e5
1296 76ac 64c0 91de b523 1c17 ec92 b4ef
84f1 a242 e26f 50ce 11e6 5b34 ced4 5034
3fdf 2ee6 97d6 f2f1 c05e 4e16 b816 a21c
97eb 152c 4625 aed2 62ed 59b6 ee58
An arbitrary fixed input is appropriate for this test, because the ability to distin-
guish different inputs to the function based on the power trace of the DUT represents
a vulnerability of the DUT to power analysis attacks. For each data partition, for
each implementation, I collected at least 100, 000 traces.
25
Figure 5.2: Experimental set-up for power trace collection
5.1.3 T-Statistic Calculation and Analysis
According to the testing methodology, the resulting power traces are combined into
a trace called the “T-statistic” using the following algorithm point-wise [15]:
Data:
Symbol Description
XA The point-average of all traces in data set A
XB The point-average of all traces in data set B
SA The point-standard deviation of all traces in data set A
SB The point-standard deviation of all traces in data set B
NA The number of traces in data set A
NB The number of traces in data set B
Result: T, The T-Statistic Trace for a DUT
T = XA−XB√
S2
A
NA
+
S2
B
NB
Algorithm 2: T-Statistic Calculation
26
The pass/fail criteria for a DUT is whether the T-statistic trace exceeds a thresh-
old of±C at any point along the trace. The higher the value of C, the more correlation
is permitted between the power trace and the input data for the device to pass the
validation criteria.
For the purposes of validating the power analysis resistance of the protected im-
plementations of the Keccak Sponge Function, I compared the maximum absolute
values for the T-statistics corresponding to each of the Keccak implementations. To
show that the protected implementations significantly limit the correlation between
input data and power trace, I show that the maximum excursion of the T-statistic for
each of the protected implementations is much lower than the maximum excursion of
the T-statistic for the unprotected implementation.
27
5.2 Validation Results
0 2000 4000 6000 8000 10000
200
100
0
100
200
Single Share
0 2000 4000 6000 8000 10000
200
100
0
100
200
Three Share Threshold
0 2000 4000 6000 8000 10000
200
100
0
100
200
Three Share Uniform
Figure 5.3: T-statistics over 5000 collected power traces
Table 5.1: T-Statistic Maximum Excursions
Implementation Unprotected Three-Share Uniformity-Preserving
Maximum |T | 197.209 13.766 15.939
Normalized to Unprotected 1 0.0698 0.0808
28
The T-statistics for each of the implementations of the Keccak Sponge Function are
shown in Figure 5.3 and their respective maximum excursions in Table 5.1. The result
clearly shows that the protected implementations meet a much more strict criteria
for power analysis validation. The unprotected implementation will fail validation
under these conditions for any maximum threshold |C| <= 197.209, whereas the pro-
tected implementations reduce the maximum |C| validation criterion to 15.939. This
represents more than a 12-fold decrease in maximum tolerable |C| for power analysis
validation of the sponge function, showing that the protected implementations do
significantly improve the resistance of the sponge function to power analysis-based
side-channel attack.
29
Chapter 6
CONCLUSION
This work demonstrates that post-synthesis implementation of the power channel
protection techniques described by Bertoni. et. al and Bilgin et. al. for the Keccak
Sponge Function measurably decrease the correlation between the data input to the
function and the power consumption of the algorithm. This extends the results of
prior work to show that the techniques do not only work in simulation, but in practice
on a Xilinx Artix-7 FPGA. Additionally, this represents an open-source publication
of a functional HDL implementation of the Keccak Sponge Function which is resis-
tant to simple and differential power analysis attack, combined with measurements
demonstrating the effect of the protection techniques on design resources including
chip area and power consumption as well as the Rambus validation methodology test
results which show that the published implementation is protected against power
analysis attack.
30
BIBLIOGRAPHY
[1] Artix-7 fpga family.
https://www.xilinx.com/products/silicon-devices/fpga/artix-7.html.
[2] Keysight infiniivision 2000 x-series.
https://literature.cdn.keysight.com/litweb/pdf/5990-
6679EN.pdf?id=1999123.
[3] Nexys 4 ddr.
https://reference.digilentinc.com/reference/programmable-
logic/nexys-4-ddr/start.
[4] Nexys 4 ddr schematic. https://reference.digilentinc.com/learn/
documentation/schematics/nexys-4-ddr-schematic.
[5] Vivado design suite.
https://www.xilinx.com/products/design-tools/vivado.html.
[6] E. Andreeva, B. Mennink, B. Preneel, and M. Sˇkrobot. Security analysis and
comparison of the sha-3 finalists blake, grøstl, jh, keccak, and skein. In
International Conference on Cryptology in Africa, pages 287–305. Springer,
2012.
[7] J.-P. Aumasson and W. Meier. Zero-sum distinguishers for reduced keccak-f
and for the core functions of luffa and hamsi. rump session of Cryptographic
Hardware and Embedded Systems-CHES, 2009:67, 2009.
[8] G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche. Sponge functions. In
ECRYPT hash workshop, volume 2007. Citeseer, 2007.
31
[9] G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche. Keccak sponge
function family main document. Submission to NIST (Round 2), 3(30), 2009.
[10] G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche. Building power
analysis resistant implementations of keccak. In Second SHA-3 candidate
conference, volume 142. Citeseer, 2010.
[11] G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche. Hardware
implementation in vhdl. https://keccak.team/archives.html, 2016.
[12] B. Bilgin, J. Daemen, V. Nikov, S. Nikova, V. Rijmen, and G. Van Assche.
Efficient and first-order dpa resistant implementations of keccak. In
International Conference on Smart Card Research and Advanced Applications,
pages 187–199. Springer, 2013.
[13] M. J. Bradley and D. L. Finn. Additivity + homogeneity. College Mathematics
Journal, 30(2):133–135, March 1999.
[14] P. FIPS. Secure hash algorithm-3 (sha-3) standard: Permutation-based hash
and extendable-output functions. National Institute for Standards and
Technology (NIST), 202(0), 2014.
[15] B. J. Gilbert Goodwill, J. Jaffe, P. Rohatgi, et al. A testing methodology for
side-channel resistance validation. In NIST non-invasive attack testing
workshop, 2011.
[16] J. Guo, M. Liu, and L. Song. Linear structures: Applications to cryptanalysis
of round-reduced keccak. In International Conference on the Theory and
Application of Cryptology and Information Security, pages 249–274. Springer,
2016.
32
[17] R. McEvoy, M. Tunstall, C. C. Murphy, and W. P. Marnane. Differential power
analysis of hmac based on sha-2, and countermeasures. In International
Workshop on Information Security Applications, pages 317–332. Springer, 2007.
[18] P. Morawiecki, J. Pieprzyk, and M. Srebrny. Rotational cryptanalysis of
round-reduced keccak. In International Workshop on Fast Software Encryption,
pages 241–262. Springer, 2013.
[19] P. Morawiecki and M. Srebrny. A sat-based preimage analysis of reduced keccak
hash functions. Information Processing Letters, 113(10-11):392–397, 2013.
[20] B. Schneier. One-way hash functions. Applied Cryptography, Second Edition,
20th Anniversary Edition, pages 429–459, 1996.
[21] B. Schneier. Cryptographic design vulnerabilities. Computer, 31(9):29–33, 1998.
[22] Xilinx. Block memory generator.
https://www.xilinx.com/products/intellectual-
property/block_memory_generator.html, February 2017.
[23] Xilinx. Clocking wizard. https://www.xilinx.com/products/intellectual-
property/clocking_wizard.html, February 2017.
33
APPENDICES
Appendix A
CODE LISTINGS
A.1 Threshold Round Permutation Module
−− The Keccak sponge func t i on , d e s i gned by Guido Bertoni , Joan Daemen ,
−− Mi c h a l Pee t e r s and G i l l e s Van Assche . For more in format ion , f e e d b a c k or
−− qu e s t i on s , p l e a s e r e f e r to our w e b s i t e : h t t p :// keccak . noekeon . org /
−− Imp lementa t ion by t h e d e s i gn e r s ,
−− hereby denoted as ” t h e implementer ” .
−− To the e x t e n t p o s s i b l e under law , t h e implementer has waived a l l c o p y r i g h t
−− and r e l a t e d or n e i g h b o r i n g r i g h t s to t h e source code in t h i s f i l e .
−− h t t p :// creat ivecommons . org / pub l i cdomain / ze ro /1 .0/
l ibrary work ;
use work . k e c cak g l oba l s . a l l ;
l ibrary i e e e ;
use i e e e . s t d l o g i c 1 1 6 4 . a l l ;
use i e e e . s t d l o g i c a r i t h . a l l ;
entity keccak round mul t i share i s
port (
round in : in k s t a t e ;
r ound con s tan t s i gna l : in s t d l o g i c v e c t o r (63 downto 0 ) ;
p i s t a t e o u t : out k s t a t e ;
i o t a s t a t e i n : in k s t a t e ;
i o t a en : in s t d l o g i c ;
round out : out k s t a t e ) ;
end keccak round mul t i share ;
architecture r t l of keccak round mul t i share i s
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−− I n t e r n a l s i g n a l d e c l a r a t i o n s
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
signal the ta in , theta out , p i i n , p i out , rho in , rho out , i o t a i n , i o t a ou t : k s t a t e ;
signal sum sheet : k p lane ;
begin −− Rt l
−−connec i t on s
34
−−order the ta , pi , rho , chi , i o t a
t h e t a i n <= round in ;
rho in <= theta out ;
p i i n <= rho out ;
−− rou t e c h i out o f t h e share f o r un i f o rm i t y
p i s t a t e o u t <= pi out ; −− i npu t to c h i
i o t a i n <= i o t a s t a t e i n ; −− ou tpu t from ch i
round out <= io t a ou t ;
−−t h e t a
−−compute sum o f columns
i 0101 : for x in 0 to 4 generate
i 0102 : for i in 0 to 63 generate
sum sheet (x ) ( i )<=the t a i n ( 0 ) ( x ) ( i ) xor t h e t a i n ( 1 ) ( x ) ( i ) xor t h e t a i n ( 2 ) ( x ) ( i ) xor t h e t a i n ( 3 ) ( x ) ( i ) xor t h e t a i n ( 4 ) ( x ) ( i ) ;
end generate ;
end generate ;
i 0200 : for y in 0 to 4 generate
i 0201 : for x in 1 to 3 generate
the ta out (y ) ( x)(0)<= the t a i n (y ) ( x ) ( 0 ) xor sum sheet (x−1)(0) xor sum sheet (x+1)(63) ;
i0202 : for i in 1 to 63 generate
the ta out (y ) ( x ) ( i )<=the t a i n (y ) ( x ) ( i ) xor sum sheet (x−1)( i ) xor sum sheet (x+1)( i −1);
end generate ;
end generate ;
end generate ;
i 2001 : for y in 0 to 4 generate
the ta out (y)(0)(0)<= the t a i n (y ) ( 0 ) ( 0 ) xor sum sheet ( 4 ) ( 0 ) xor sum sheet ( 1 ) ( 6 3 ) ;
i2021 : for i in 1 to 63 generate
the ta out (y ) ( 0 ) ( i )<=the t a i n (y ) ( 0 ) ( i ) xor sum sheet ( 4 ) ( i ) xor sum sheet ( 1 ) ( i −1);
end generate ;
end generate ;
i 2002 : for y in 0 to 4 generate
the ta out (y)(4)(0)<= the t a i n (y ) ( 4 ) ( 0 ) xor sum sheet ( 3 ) ( 0 ) xor sum sheet ( 0 ) ( 6 3 ) ;
i2022 : for i in 1 to 63 generate
the ta out (y ) ( 4 ) ( i )<=the t a i n (y ) ( 4 ) ( i ) xor sum sheet ( 3 ) ( i ) xor sum sheet ( 0 ) ( i −1);
end generate ;
end generate ;
−− p i
i 3001 : for y in 0 to 4 generate
i 3002 : for x in 0 to 4 generate
i 3003 : for i in 0 to 63 generate
−−p i o u t ( y ) ( x ) ( i )<=p i i n ( ( y +2∗x ) mod 5 ) ( ( ( 4∗ y)+x ) mod 5) ( i ) ;
p i ou t ((2∗x+3∗y ) mod 5)(0∗x+1∗y ) ( i )<=p i i n (y ) (x ) ( i ) ;
end generate ;
end generate ;
35
end generate ;
−−rho
i 4001 : for i in 0 to 63 generate
rho out ( 0 ) ( 0 ) ( i )<=rho in ( 0 ) ( 0 ) ( i ) ;
end generate ;
i 4002 : for i in 0 to 63 generate
rho out ( 0 ) ( 1 ) ( i )<=rho in ( 0 ) ( 1 ) ( ( i −1)mod 64 ) ;
end generate ;
i 4003 : for i in 0 to 63 generate
rho out ( 0 ) ( 2 ) ( i )<=rho in ( 0 ) ( 2 ) ( ( i −62)mod 64 ) ;
end generate ;
i 4004 : for i in 0 to 63 generate
rho out ( 0 ) ( 3 ) ( i )<=rho in ( 0 ) ( 3 ) ( ( i −28)mod 64 ) ;
end generate ;
i 4005 : for i in 0 to 63 generate
rho out ( 0 ) ( 4 ) ( i )<=rho in ( 0 ) ( 4 ) ( ( i −27)mod 64 ) ;
end generate ;
i 4011 : for i in 0 to 63 generate
rho out ( 1 ) ( 0 ) ( i )<=rho in ( 1 ) ( 0 ) ( ( i −36)mod 64 ) ;
end generate ;
i 4012 : for i in 0 to 63 generate
rho out ( 1 ) ( 1 ) ( i )<=rho in ( 1 ) ( 1 ) ( ( i −44)mod 64 ) ;
end generate ;
i 4013 : for i in 0 to 63 generate
rho out ( 1 ) ( 2 ) ( i )<=rho in ( 1 ) ( 2 ) ( ( i −6)mod 64 ) ;
end generate ;
i 4014 : for i in 0 to 63 generate
rho out ( 1 ) ( 3 ) ( i )<=rho in ( 1 ) ( 3 ) ( ( i −55)mod 64 ) ;
end generate ;
i 4015 : for i in 0 to 63 generate
rho out ( 1 ) ( 4 ) ( i )<=rho in ( 1 ) ( 4 ) ( ( i −20)mod 64 ) ;
end generate ;
i 4021 : for i in 0 to 63 generate
rho out ( 2 ) ( 0 ) ( i )<=rho in ( 2 ) ( 0 ) ( ( i −3)mod 64 ) ;
end generate ;
i 4022 : for i in 0 to 63 generate
rho out ( 2 ) ( 1 ) ( i )<=rho in ( 2 ) ( 1 ) ( ( i −10)mod 64 ) ;
end generate ;
i 4023 : for i in 0 to 63 generate
rho out ( 2 ) ( 2 ) ( i )<=rho in ( 2 ) ( 2 ) ( ( i −43)mod 64 ) ;
end generate ;
i 4024 : for i in 0 to 63 generate
rho out ( 2 ) ( 3 ) ( i )<=rho in ( 2 ) ( 3 ) ( ( i −25)mod 64 ) ;
end generate ;
i 4025 : for i in 0 to 63 generate
rho out ( 2 ) ( 4 ) ( i )<=rho in ( 2 ) ( 4 ) ( ( i −39)mod 64 ) ;
end generate ;
i 4031 : for i in 0 to 63 generate
rho out ( 3 ) ( 0 ) ( i )<=rho in ( 3 ) ( 0 ) ( ( i −41)mod 64 ) ;
36
end generate ;
i 4032 : for i in 0 to 63 generate
rho out ( 3 ) ( 1 ) ( i )<=rho in ( 3 ) ( 1 ) ( ( i −45)mod 64 ) ;
end generate ;
i 4033 : for i in 0 to 63 generate
rho out ( 3 ) ( 2 ) ( i )<=rho in ( 3 ) ( 2 ) ( ( i −15)mod 64 ) ;
end generate ;
i 4034 : for i in 0 to 63 generate
rho out ( 3 ) ( 3 ) ( i )<=rho in ( 3 ) ( 3 ) ( ( i −21)mod 64 ) ;
end generate ;
i 4035 : for i in 0 to 63 generate
rho out ( 3 ) ( 4 ) ( i )<=rho in ( 3 ) ( 4 ) ( ( i −8)mod 64 ) ;
end generate ;
i 4041 : for i in 0 to 63 generate
rho out ( 4 ) ( 0 ) ( i )<=rho in ( 4 ) ( 0 ) ( ( i −18)mod 64 ) ;
end generate ;
i 4042 : for i in 0 to 63 generate
rho out ( 4 ) ( 1 ) ( i )<=rho in ( 4 ) ( 1 ) ( ( i −2)mod 64 ) ;
end generate ;
i 4043 : for i in 0 to 63 generate
rho out ( 4 ) ( 2 ) ( i )<=rho in ( 4 ) ( 2 ) ( ( i −61)mod 64 ) ;
end generate ;
i 4044 : for i in 0 to 63 generate
rho out ( 4 ) ( 3 ) ( i )<=rho in ( 4 ) ( 3 ) ( ( i −56)mod 64 ) ;
end generate ;
i 4045 : for i in 0 to 63 generate
rho out ( 4 ) ( 4 ) ( i )<=rho in ( 4 ) ( 4 ) ( ( i −14)mod 64 ) ;
end generate ;
−−i o t a
i 5001 : for y in 1 to 4 generate
i 5002 : for x in 0 to 4 generate
i 5003 : for i in 0 to 63 generate
i o t a ou t (y ) ( x ) ( i )<=i o t a i n (y ) ( x ) ( i ) ;
end generate ;
end generate ;
end generate ;
i 5012 : for x in 1 to 4 generate
i 5013 : for i in 0 to 63 generate
i o t a ou t ( 0 ) ( x ) ( i )<=i o t a i n ( 0 ) ( x ) ( i ) ;
end generate ;
end generate ;
i 5103 : for i in 0 to 63 generate
i o t a ou t ( 0 ) ( 0 ) ( i )<=i o t a i n ( 0 ) ( 0 ) ( i ) xor ( r ound con s tan t s i gna l ( i ) and i o t a en ) ;
end generate ;
end r t l ;
37
A.2 Uniform Chi
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−− Group : Cal Poly CPE SHA−3 Research Team
−− Engineer : Na than i e l Gra f f
−−
−− Create Date : 12/06/2016 03 : 38 : 12 PM
−− Design Name : ch i un i f o rm
−− Module Name : ch i un i f o rm − Behav i o ra l
−− Pro j e c t Name : Keccak Research
−− Des c r i p t i on : Uniform three−share ch i s t e p imp lementa t ion
−−
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
l ibrary work ;
use work . k e c cak g l oba l s . a l l ;
l ibrary i e e e ;
use i e e e . s t d l o g i c 1 1 6 4 . a l l ;
use i e e e . s t d l o g i c a r i t h . a l l ;
entity ch i un i fo rm i s
port (
c h i r a nd b i t s : in s t d l o g i c v e c t o r (3 downto 0 ) ;
c h i i n a : in k s t a t e ;
c h i i n b : in k s t a t e ;
c h i i n c : in k s t a t e ;
c h i ou t a : out k s t a t e ;
ch i ou t b : out k s t a t e ;
c h i o u t c : out k s t a t e ) ;
end ch i un i fo rm ;
architecture Behaviora l of ch i un i fo rm i s
−− I n t e rmed i a t e v a l u e s to s t o r e t h e ou tpu t o f c h i prime
signal ch i pr ime a , ch i pr ime b , ch i p r ime c : k s t a t e ;
−− Random b i t s
signal p , s : s t d l o g i c v e c t o r (1 downto 0 ) ;
begin
−− Map random b i t s i n t o p and s f o r easy consumption
p <= ch i r a nd b i t s (1 downto 0 ) ;
s <= ch i r a nd b i t s (3 downto 2 ) ;
−− Chi prime
i 0000 : for y in 0 to 4 generate
i 0001 : for x in 0 to 4 generate
i 0002 : for i in 0 to 63 generate
ch i p r ime a (y ) ( x ) ( i ) <= ch i i n b (y ) ( x ) ( i ) xor
( (not c h i i n b (y ) ( ( x+1) mod 5) ( i ) ) and c h i i n b (y ) ( ( x+2) mod 5) ( i ) ) xor
( c h i i n b (y ) ( ( x+1) mod 5) ( i ) and c h i i n c (y ) ( ( x+2) mod 5) ( i ) ) xor
( c h i i n b (y ) ( ( x+2) mod 5) ( i ) and c h i i n c (y ) ( ( x+1) mod 5) ( i ) ) ;
38
ch i pr ime b (y ) ( x ) ( i ) <= ch i i n c (y ) ( x ) ( i ) xor
( (not c h i i n c (y ) ( ( x+1) mod 5) ( i ) ) and c h i i n c (y ) ( ( x+2) mod 5) ( i ) ) xor
( c h i i n c (y ) ( ( x+1) mod 5) ( i ) and c h i i n a (y ) ( ( x+2) mod 5) ( i ) ) xor
( c h i i n c (y ) ( ( x+2) mod 5) ( i ) and c h i i n a (y ) ( ( x+1) mod 5) ( i ) ) ;
ch i p r ime c (y ) ( x ) ( i ) <= ch i i n a (y ) ( x ) ( i ) xor
( (not c h i i n a (y ) ( ( x+1) mod 5) ( i ) ) and c h i i n a (y ) ( ( x+2) mod 5) ( i ) ) xor
( c h i i n a (y ) ( ( x+1) mod 5) ( i ) and c h i i n b (y ) ( ( x+2) mod 5) ( i ) ) xor
( c h i i n a (y ) ( ( x+2) mod 5) ( i ) and c h i i n b (y ) ( ( x+1) mod 5) ( i ) ) ;
end generate ;
end generate ;
end generate ;
−− Propoga t ing rows not i n c l u d e d in f u r t h e r s t e p s
i 0010 : for y in 0 to 4 generate
i 0011 : for x in 0 to 2 generate
i 0012 : for i in 0 to 63 generate
ch i ou t a (y ) ( x ) ( i ) <= ch i pr ime a (y ) ( x ) ( i ) ;
ch i ou t b (y ) ( x ) ( i ) <= chi pr ime b (y ) ( x ) ( i ) ;
c h i o u t c (y ) ( x ) ( i ) <= ch i p r ime c (y ) ( x ) ( i ) ;
end generate ;
end generate ;
end generate ;
−− I n j e c t i o n o f random b i t s
i 0021 : for x in 3 to 4 generate
i 0022 : for i in 0 to 63 generate
ch i ou t a ( 0 ) ( x ) ( i ) <= ch i pr ime a ( 0 ) ( x ) ( i ) xor p(x−3) xor s (x−3);
ch i ou t b ( 0 ) ( x ) ( i ) <= chi pr ime b (0 ) ( x ) ( i ) xor p(x−3);
c h i o u t c ( 0 ) ( x ) ( i ) <= ch i p r ime c ( 0 ) ( x ) ( i ) xor s (x−3);
end generate ;
end generate ;
−− Mixing to j o i n t l y s a t i s f y un i f o rm i t y
i 0030 : for y in 1 to 4 generate
i 0031 : for x in 3 to 4 generate
i 0032 : for i in 0 to 63 generate
ch i ou t a (y ) ( x ) ( i ) <= ch i pr ime a (y ) ( x ) ( i ) xor c h i i n a (y−1)(x ) ( i ) xor c h i i n b (y−1)(x ) ( i ) ;
ch i ou t b (y ) ( x ) ( i ) <= chi pr ime b (y ) ( x ) ( i ) xor c h i i n a (y−1)(x ) ( i ) ;
c h i o u t c (y ) ( x ) ( i ) <= ch i p r ime c (y ) ( x ) ( i ) xor c h i i n b (y−1)(x ) ( i ) ;
end generate ;
end generate ;
end generate ;
end Behaviora l ;
39
A.3 Power Trace Capture Script
#!/ usr / b in / env python3
import s e r i a l
import v i s a
import psycopg2
import sys
import os
import codecs
import lzma
import matp lo t l ib . pyplot as pyplot
from time import s l e ep
from tabu la te import tabu la te
import keccak
FPGA PORT = ””
DBHOST = ””
DBNAME = ””
DBUSER = ””
DBPASS = ””
CAPTURE TRACE COUNT = 100000
CONFIG NUMBER = 0
ENABLE STATIC = False
STATIC INPUT = b ’ \\ ,C\ x fe \xc6\xa3\x87\xd8v ; y\xaf |\ xa2\xd08D\x1b\xac )Pt\x9d ’ + \
b ’\xf2 :L\x1e\xe6 |\ xcb\xa9\xa7\x00\x19Zp\x86JU\x7f\xcc\x82 ’ + \
b ’\x9b\xde\x07b2\x18\x94o$ ?\xb9o\x94x\xd8@h\x9a\x84b\x12\xe5 ’ + \
b ’\x12\x96v\xacd\xc0\x91\xde\xb5#\x1c\x17\xec\x92\xb4\ xe f ’ + \
b ’\x84\xf1\xa2B\xe2oP\xce\x11\xe6 [4\ xce\xd4P4?\ xdf .\ xe6\x97 ’ + \
b ’\xd6\xf2\xf1\xc0ˆN\x16\xb8\x16\xa2\x1c\x97\xeb\x15 ,F%\xae ’ + \
b ’\xd2b\xedY\xb6\xeeX ’
def getScope ( ) :
rm = v i sa . ResourceManager ( )
instruments = rm . l i s t r e s o u r c e s ( )
i f ( len ( instruments ) == 0 ) :
print ( ”No instruments connected ” )
e x i t ( )
e l i f ( len ( instruments ) == 1 ) :
scope = rm . open re source ( instruments [ 0 ] )
print ( ”Connected to instrument : ” + instruments [ 0 ] )
else :
print ( ” Please s e l e c t an instrument : ” )
for (num, instrument ) in zip ( range (0 , len ( instruments ) ) , instruments ) :
print ( str (num) + ” − ” + instruments [num] )
sys . stdout . wr i t e ( ” Instrument number : ” )
num = int ( sys . s td in . read ( 1 ) )
scope = rm . open re source ( instruments [num] )
scope . timeout = 10000
scope . r ead te rminat i on = ’\n ’
40
scope . wr i t e t e rm ina t i on = ’\n ’
scope . c l e a r ( )
print ( ”Querying scope i d en t i t y . . . ” )
instrumentdata = scope . query ( ”∗IDN?” ) . s p l i t ( ’ , ’ )
print ( tabu la te ( [ instrumentdata ] , [ ”Manufacturer ” , ”Model” , ” S e r i a l #” , ” Software Vers ion ” ] , tab le fmt=” psq l ” ) )
scope . wr i t e ( ”∗RST” )
scope . query ( ”∗OPC?” )
return scope
def scopeCommand( scope , command ) :
scope . wr i t e (command)
scope . query ( ”∗OPC?” )
def getFPGA ( ) :
return s e r i a l . S e r i a l (FPGA PORT, timeout=1)
def rearrangeHex (bin ) :
# re v e r s e b y t e order w i t h i n each word
bout = b ’ ’
for word in [ bin [16∗n :16∗ ( n+1)] for n in range ( len (bin ) // 1 6 ) ] :
b y t e l i s t = [ word [2∗n : 2∗ ( n+1)] for n in range ( len (word ) // 2 ) ]
b y t e l i s t . r e v e r s e ( )
bout += b ’ ’ . j o i n ( b y t e l i s t )
return bout
def rearrangeBytes (bin ) :
return codecs . decode ( rearrangeHex ( codecs . encode (bin , ’ hex codec ’ ) ) , ’ hex codec ’ )
def getTrace ( scope ) :
scope . query ( ”∗OPC?” )
t ra c e = b ’ ’
scopeCommand( scope , ” :WAVeform :SOURce CHANnel1” )
scopeCommand( scope , ” :WAVeform :FORMat BYTE” )
scopeCommand( scope , ” :WAVeform : UNSigned ON” )
scopeCommand( scope , ” :WAVeform : BYTeorder MSBFirst” )
scopeCommand( scope , ” :WAVeform : POINts MAXimum” )
length = int ( scope . query ( ” :WAVeform : POINts?” ) )
print ( ”Downloading ” + str ( l ength ) + ”−point t r a c e from o s c i l l o s c o p e . ” )
scope . wr i t e ( ” :WAVeform :DATA?” )
t ra c e += scope . read raw ( )
return t r a c e [10 :−1]
def conf igCapture ( scope ) :
scope . query ( ”∗OPC?” )
# turn on bo th channe l s
scopeCommand( scope , ” : CHANnel1 : DISPlay ON” )
41
# t r i g g e r on the p o s i t i v e edge o f t h e e x t e r n a l t r i g g e r i npu t
scopeCommand( scope , ” : TRIGger :EDGE:SOURce EXTernal” )
scopeCommand( scope , ” : TRIGger :EDGE:SLOPe POSitive ” )
scopeCommand( scope , ” : TRIGger :EDGE: LEVel 1” )
# con f i g u r e axes
scopeCommand( scope , ” : CHANnel1 : COUPling AC” )
scopeCommand( scope , ” : CHANnel1 : SCALe 0 .1 ” )
scopeCommand( scope , ” : TIMebase : SCALe 0.00000056 ” )
scopeCommand( scope , ” : TIMebase : POSition 0.0000025 ” )
i f name == ” main ” :
print ( ”−−−−−−−−−−−−−−−−−−−−−−−−−−−−−” )
print ( ” | SHA−3 Power Trace Capture | ” )
print ( ”−−−−−−−−−−−−−−−−−−−−−−−−−−−−−” )
# open s e r i a l po r t t o FPGA
print ( ”Opening connect ion to FPGA” )
fpga = getFPGA()
# open connec t i on to da ta ba s e
print ( ”Opening connect ion to database ” )
conn = psycopg2 . connect (dbname=DBNAME, user=DBUSER, password=DBPASS, host=DBHOST)
cur = conn . cur so r ( )
# open and r e s e t o s c i l l o s c o p e
print ( ”Opening connect ion to o s c i l l o s c o p e ” )
scope = getScope ( )
# con f i g u r e o s c i l l o s c o p e
print ( ”Conf igur ing scope ” )
conf igCapture ( scope )
t r a c e s = 0
# main l oop
while True :
# gene ra t e i npu t v e c t o r
i npu t ve c to r = b ’ ’
i f ENABLE STATIC:
input ve c to r = STATIC INPUT
else :
i npu t ve c to r = os . urandom (126)
print ( ”Generated input vector : ” )
print ( i npu t ve c to r )
# compute v a l i d hash ou tpu t
expected output = codecs . encode ( keccak .hash ( i npu t ve c to r ) , ’ hex codec ’ )
print ( ”Expected hash output : ” )
print ( expected output )
print ( ” Se t t ing S ing l e Trace Capture” )
42
scope . wr i t e ( ” : SINGle” )
s l e ep ( 0 . 5 )
# pad inpu t v e c t o r
i npu t ve c to r += b ’\x01\x80 ’
# send inpu t v e c t o r to FPGA
print ( ”Sending input vector to FPGA” )
fpga . wr i t e (b ’\x00\x01 ’ ) # s i n g l e t e s t v e c t o r
fpga . wr i t e ( rearrangeBytes ( i nput ve c to r ) )
# wai t f o r FPGA
while ( fpga . i n wa i t i ng == 0 ) :
s l e ep ( 0 . 1 )
# r e c e i v e FPGA hash ou tpu t
fpga output = rearrangeHex ( codecs . encode ( fpga . read (32) , ’ hex codec ’ ) )
print ( ”Received FPGA hash output : ” )
print ( fpga output )
# compare hashes
i f ( expected output == fpga output ) :
print ( ”SUCCESS: FPGA hash matches expected value ” )
# p u l l t r a c e from o s c i l l o s c o p e
t r a c e = getTrace ( scope )
# compress t r a c e
print ( ”Compressing power t ra c e f o r s to rage ” )
l z = lzma . LZMACompressor ( )
comp trace = l z . compress ( t r a c e )
comp trace += l z . f l u s h ( )
print ( ”Trace compressed with r a t i o : ” + str ( len ( t r a c e ) / len ( comp trace ) ) )
# wr i t e row to da ta ba s e
print ( ”Writing power t ra c e to database ” )
cur . execute ( ’ ’ ’ INSERT INTO t r a c e s ( c o n f i g i d , cap tu r e t ime , i n p u t v e c t o r , c ompre s s ed t r a c e )
VALUES (%( c o n f i g i d ) s , now ( ) , %( i n p u t v e c t o r ) s , %(comp trace ) s ) ’ ’ ’ ,
{ ’ c o n f i g i d ’ : CONFIG NUMBER,
’ i nput ve c to r ’ : input vector ,
’ comp trace ’ : comp trace })
conn . commit ( )
t r a c e s += 1
print ( ”Captured ” + str ( t r a c e s ) + ” t r a c e s ” )
else :
print ( ”FAILURE: FPGA hash does not match expected value ” )
i f t r a c e s == CAPTURE TRACE COUNT:
break
43
# c l o s e da tabase , s e r i a l port , and o s c i l l o s c o p e connec t i on s
cur . c l o s e ( )
conn . c l o s e ( )
fpga . c l o s e ( )
scope . c l o s e ( )
44
