Defending cache memory against cold-boot attacks boosted by power or EM radiation analysis by Neagu, Madalin & Manich Bou, Salvador
Author’s Accepted Manuscript
Defending Cache memory against cold-boot attacks
boosted by power or EM radiation analysis
Mădălin Neagu, Salvador Manich
PII: S0026-2692(16)30468-2
DOI: http://dx.doi.org/10.1016/j.mejo.2017.02.010
Reference: MEJ4153
To appear in: Microelectronics Journal
Received date: 3 October 2016
Revised date: 9 January 2017
Accepted date: 16 February 2017
Cite this article as: Mădălin Neagu and Salvador Manich, Defending Cache
memory against cold-boot attacks boosted by power or EM radiation analysis,
Microelectronics Journal, http://dx.doi.org/10.1016/j.mejo.2017.02.010
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
www.elsevier.com/locate/mejo
Defending Cache Memory against Cold-Boot Attacks
Boosted by Power or EM Radiation Analysis
Ma˘da˘lin Neagu
Department of Computer Science,
Technical University of Cluj-Napoca
Cluj-Napoca, Romania
Madalin.Neagu@cs.utcluj.ro, neagumada@yahoo.com
Salvador Manich
Department of Electronic Engineering
Universitat Polite`cnica de Catalunya-BarcelonaTech
Barcelona, Spain
Salvador.Manich@upc.edu
Abstract
Some algorithms running with compromised data select cache memory as
a type of secure memory where data is confined and not transferred to main
memory. However, cold-boot attacks that target cache memories exploit the
data remanence. Thus, a sudden power shutdown may not delete data entirely,
giving the opportunity to steal data. The biggest challenge for any technique
aiming to secure the cache memory is performance penalty. Techniques based
on data scrambling have demonstrated that security can be improved with a
limited reduction in performance. However, they still cannot resist side-channel
attacks like power or electromagnetic analysis. This paper presents a review
of known attacks on memories and countermeasures proposed so far and an
improved scrambling technique named random masking interleaved scrambling
technique (RM-ISTe). This method is designed to protect the cache memory
against cold-boot attacks, even if these are boosted by side-channel techniques
like power or electromagnetic analysis.
Keywords: data scrambling, cache memories, differential power analysis,
side-channel attack, error correction
1. Introduction
1.1. Motivation
In a recent report [1], the author pointed out that 62% of companies world-
wide were subject to payment fraud in 2014 and that credit/debit cards are the
second most frequent target of payment fraud. Mobile payments are a relatively5
Preprint submitted to Microelectronics Journal February 17, 2017
CPU 
Cache Main 
Memory 
Wrc 
Rdc 
Wrm 
Rdm 
D D
data  data  
Chip – integrated External 
High 
speed 
Low 
speed 
D
D
Figure 1: Simplified model of a computer memory system.
new payment method, but this trend is increasing among large companies and
organizations. However, there are several uncertainties about it such as disclo-
sure of sensitive information or secure transfer of information. Ensuring the
confidentiality of sensitive information is becoming more and more crucial [2].
Very often general purpose devices like desktops, laptops or smartphones are10
used for private transactions with financial entities or healthcare issues, among
others. In the case of devices without specialized hardware, all cryptographic
operations are executed in software, resulting in an intensive use of memory [3].
This poses sensitive data at risk, including that stored in the cache memory [4].
Research on SRAMs has demonstrated that data can be maintained almost15
intact for a couple of minutes if the chip is kept at low temperatures or even at
room temperature and without power supply [5]. This phenomenon is known
as data remanence and results show that chip manufacturers do not control
memory retention time as part of their manufacturing quality process. However,
memory retention time varies between devices from the same manufacturer and20
of the same type but of different subtype or series. Also, low power versions of
the same chips always seem to have longer retention times. This has opened up
a whole new domain that has been widely investigated in the last decade. The
scope of this work is in this realm.
When the CPU is running, it needs to work with sensitive data in plain25
form and, depending on the operating system, it may generate several copies in
memory, exposing data to different kinds of attacks [3]. The problem with data
remanence is that an adversary can use a cold-boot attack to extract critical
data from memory by completely circumventing the software controlling the
CPU [6]. This problem is discussed in depth in the following paragraphs.30
1.2. Attacks on memory
A simplified model of a computer memory system is presented in Fig. 1.
Only cache and main memories are shown. Cache memory usually has two
2
CPU 
Cache Main 
Wrc 
Rdc 
Wrm 
Rdm 
D D
data  data  
Physically 
removed 
Fake 
CPU 
Rdm 
Sudden power cut 
D
D D
Power restored from 
a backup system 
Data remanence 
comes into play 
Figure 2: Cold-boot attack on main memory.
levels, L1 and L2, but this paper focuses on the L2 cache because it is the larger
of the two and its size allows enough data to be stored for full computations,35
being considered in some cases as a secure memory [7]. Cache memory is always
integrated in the CPU package, even embedded in the CPU chip or assembled
with it using 3D technology in the most advanced versions. This makes it
feasible to use a high speed bus for reading and writing data, keeping CPU
performance at the maximum level. Main memory is commonly external; it can40
be expanded and communicates with cache through a lower speed bus. It reads
data only in the event of cache miss and writes according to some policies like
write-through or write-back.
Since the CPU reads more frequently than it writes, and often does so in
similar address ranges, the main memory bus is much less used than the cache45
memory bus, typically 20 times less. For this reason, loss in transmission speed
in the main memory bus has less impact on CPU performance than in the cache
memory bus.
1.3. Cold-boot attacks
An overview of a cold-boot attack on main memory is illustrated in Fig. 2.50
While the software that stores sensitive data in memory is running, the power
supply is suddenly disconnected and the main memory is rapidly removed, con-
nected to a backup system and powered up again. Then, the victim’s memory
content is downloaded into a backup machine and from there critical data like
encryption keys or any other type of sensitive data is extracted [6]. By cooling55
the memory modules, degradation of volatile memory is slowed down; hence, an
adversary has more time to act. The results of the attacks in [6] [8] show that
by cooling a memory chip at -50◦ C, decay within a 1MB region is 0.13% (after
3
CPU 
Cache Main 
Wrc 
Rdc 
Wrm 
Rdm 
D D
data  data  
Fake 
boot up OS 
Sudden power cut 
D
D
Power restored 
Data remanence 
comes into play 
Figure 3: Cold-boot attack without removing memory chips.
60 minutes). Also, after a single minute without power supply, 99.9% of bits
were recovered correctly.60
This attack is not feasible on a cache memory because the latter cannot be
physically removed. However, a variation in Fig. 3 illustrates how an adversary
using cold-boot attack could steal data.
Once the CPU has run the software of interest and sensitive data has been
stored in memory, power is suddenly cut off and immediately afterwards the65
system is booted up with an ad-hoc fake program which makes a backup copy
of the memory. Next, data is analyzed and sensitive information extracted. As
an example, this kind of cold-boot attack was conducted on several smartphones
[9]. A simple reboot from an ad-hoc operating system (FROST) immediately
made a backup copy of the memory content and the secret keys of encrypted70
flash data and other sensitive data were extracted with specialized software
tools. The technique used managed to recover email messages, contact lists,
credit card data and other login credentials after cooling the device to 5◦ C.
These smartphones had a boot locking mechanism that ensured the deletion of
data in the users partition and cache memory. However, not all models had this75
option activated by default, which might have been unknown to users.
In a recent work [7], several techniques to improve the security of smart-
phones and tablets were discussed. In these architectures, two types of memories
internal to the CPU package can execute basic functionalities without accessing
the main RAM, i.e. iRAM and L2 cache. They are considered secure mainly be-80
cause after reboot, and this includes unexpected power cut offs, firmware cleans
them up completely. Unfortunately, as said in [10], firmware can be attacked in
many different ways, and thus it cannot be regarded as a strong security pillar.
The rest of the paper is organized as follows. In Section 2 security strategies
for cache and main memory are reviewed. In Section 3, power and electro-85
4
CPU 
Cache Main 
Wrc 
Rdc 
Wrm 
Rdm 
SD ESD
data  data  
( )En SD
( )De ESD
key 
( )Sc D
( )Sc SD
scrambling 
vector 
D
D
More prone to 
side-channel 
attacks 
Session S Session key 
Figure 4: Published proposals securing memory at hardware level against cold-boot attacks.
magnetic radiation analysis are introduced. Next, in Section 4, the proposed
solution is presented and in Section 5 the results are summarized. Finally in
Section 6 the conclusions are discussed.
2. Securing memory at hardware level
In order to provide stronger security against cold-boot attacks, more close90
to the hardware solutions are needed, as pointed out in [7]. These can be
complemented by other firmware and software techniques, leading to a whole
solution reaching different abstraction levels of the system. A review of existing
hardware solutions is first presented, and then other possible alternatives are
proposed. Most can be placed in the general scheme of Fig. 4.95
2.1. Main memory
The most secure way to protect main memory data is to encrypt it in real
time. A session key is generated and a strong, secure algorithm like AES [11] or
a low-latency one like PRINCE [12] is used to encrypt and decrypt according to
CPU demand. The main advantage of this method is that the circuit providing100
the session key will change it if a reset condition occurs, which will invalidate
all memory data completely, thwarting any attempt to read the memory after
boot. Memory bus encryption makes use of time clearance provided by the
cache memory such that the performance loss is buffered [13, 14, 15, 16, 17]. The
drawback of this solution is that encryption must be executed by an independent105
co-processor to reach enough speed and robustness, and for power consumption
to remain below reasonable limits. That is why portable devices are not expected
to use this strategy mainly owing to power consumption. Some smartphones
and tablets use a kind of memory encryption but this action is only applied
during user locking / unlocking [7]. In [14], a refresh mechanism was added110
to the above scheme which changes the key periodically, thus strengthening
protection against side-channel attacks. The key is generated by a specialized
smart card IP. In [15], this strategy was selected to protect systems with non-
volatile memory, like ferromagnetic RAM. Encryption is executed incrementally
5
and, in case of unexpected reboot, the whole memory is encrypted in 5 seconds.115
In [16, 17], memory encryption includes a time stamp to counteract replay
attacks.
2.2. Cache memory
A similar protection scheme can be used for cache memory. However, no
encryption process can be easily selected because of the high speed required by120
the bus. Scrambling techniques provide a lighter security degree. Data vectors
are scrambled/descrambled with a session scrambling vector (S) by an XOR
operation [18, 19]. As before, the circuit providing the S will change it after
each reset, invalidating data in case of attempt to read cache after boot. The
advantages are that high speed can be achieved by the scrambling and that the125
impact on power consumption is negligible. In fact, Intel uses this technique to
transmit memory bus data, reducing high current peaks that could aggressively
disturb the power supply lines [20]. The main drawback lies in that scrambling
is not a securing but an obscuring technique which provides a low degree of
security and are prone to side-channel attacks aiming to discover the S. Hence,130
frequent refresh of the S could improve the level of security, but cache data could
become invalidated, and would need therefore to be updated. This would require
the cache controller to shut down completely. In [18], scrambling was applied
to data and addresses. The technique aims to defend cache memory against a
wide range of side-channel attacks. It consists in a two-step scrambling process135
for both data and addresses with two different encryption keys stored in the
main memory. Hence, a cold-boot attack on the main memory can disclose
the memory section containing the encryption keys. In [19], a data scrambling
technique was proposed to protect cache data as follows: the first half of a
word is scrambled with the first bit of the first half, and the second half is then140
XORed with the scrambled result of the first half. The advantage is that since
no scrambling vectors have to be stored in additional hardware, no data or area
overhead has to be added. However, the patterns created by this method can
be easily understood and peculiar data samples provide adversaries with a lot
of information.145
2.3. Interleaved Scrambling Technique
Interleaved Scrambling Technique (IST) is a security solution for cache pro-
tection against cold-boot attacks presented in [21]. It enhances standard scram-
bling [18, 19] because the scrambling vectors can be refreshed continuously with-
out interrupting communication between the CPU and cache. It can also be150
integrated into a global protection scheme, as illustrated in Fig. 5.
Internally, IST works with pairs of interleaving scrambling vectors. One
(the young one) is used for writing data to and reading data from the cache
memory whereas the other (the old one) is used only for reading. Therefore,
when the pair is active, the cache memory becomes filled with data scrambled by155
the young vector while being emptied of data scrambled by the old one. Once
the cache is cleared of all data scrambled by the old scrambling vector, this
6
CPU
Cache
Main
Wrc
Rdc
Wrm
Rdm
SD
SDdata 
data 
Fast encrypting, dynamic refresh
( )En S
( )De ES
key
( )Sc D
( )Sc SD
scrambling
vectors’
table
D
D
ES
stack Encrypted
expired
scrambling
vectors
Expired
sS
Expired
sS
Wr stack
Rd stack
Session keyDynamicrefresh
Power up
refresh
IST
More prone to 
side-channel
attacks
Figure 5: Interleaved Scrambling Technique. Hardware protection for cache that can be
integrated into a global protection scheme.
vector expires and the pair is renewed such that the young scrambling vector
becomes old and a newly generated scrambling vector becomes the young one.
After reset, new scrambling vectors are generated again, invalidating cache data160
completely. IST can be used alone to protect the cache memory, as in [21],
or be integrated into a global security solution. Fig. 5 shows one possible
scheme which maintains high bus speed and a low level of power compared to
alternatives that encrypt bus data. In this approach, scrambled data is also
written back to main memory. The scrambling and descrambling process is all165
concentrated in the same unit, which is located close to the CPU. As scrambled
data flow through all buses, they help to keep current levels stable, as pointed
out by Intel in [20]. Data stored in memory but not copied in cache will need
the corresponding scrambling vectors to be made available in case the CPU
needs them. Thus, whenever a scrambling vector expires, it is made available170
encrypted with a session key in a buffer such that the CPU can store it in
a non-cached memory page. In this way, encryption speed requirements are
much lower than those of data transfer, scrambling vectors generated internally
cannot be reverse engineered and cold-boot attacks on main memory data and
scrambling vectors are thwarted.175
2.3.1. Side-channel attacks on IST
Even though IST refreshes scrambling vectors periodically, memories pro-
tected by IST are still prone to side-channel attacks because the scrambling
technique is not intrinsically robust against this type of attacks. By using
power or electromagnetic radiation analysis, an adversary could discover the180
scrambling vector in use for writing after several attacks. Although this would
be difficult in practice, he could theoretically descramble cache or main mem-
ory data downloaded after a cold-boot attack. In the coming paragraphs, this
7
CPU D
S
SD
Data 
memory 
bus 
Bus 
driver 
DDV
GND GND
LC
DDI
GNDI
n n
n
n
DDEM
Magnetic 
field 
To cache 
Figure 6: Switching drivers and bus line currents excite power peaks and electromagnetic
pulses which leak information about data flowing into the drivers and buses
weakness is explained and the methodology to improve IST robustness against
this type of side-channel attack is presented and evaluated.185
3. Power (P) and Electromagnetic (EM) Radiation Analysis
For the sake of clarity and without loss of generality, our discussion focuses
on a scrambling circuit using a single scrambling vector. Let us assume that
a data vector D is placed in the data bus. Before it being sent to the cache,
IST mixes it with a scrambling vector S through XOR gates such that the190
scrambled data vector SD = D⊕S is generated. Reverting this transformation
(descrambling) is simple, SD ⊕ S = D ⊕ S ⊕ S = D. A simplified model of the
elements involved in information leakage is shown in Fig. 6.
Transmitting data flowing in a bus involves charging and discharging load
capacitances, the parasitic capacitances of the lines themselves and of the next195
stage logic. This means that in each clock cycle some of the lines first source
current from the power supply line and then drain it to ground. All switching bus
lines source currents from the power supply line PDD whose intensity is strongly
related to the amount of scrambled data transmitted through the bus. Similarly,
the magnetic fields of these currents add up and generate an electromagnetic200
wave EMDD whose intensity depends on the scrambled data too. Hamming
weight is a metric that adds up the contributions of individual bits in a word.
Hence, it is often used to find correlations between data and power or radiated
electromagnetic intesity [22].
Using this side-channel leakage an adversary can undergo the cache memory205
to a Simple Power or Electromagnetic Analysis (SPEMA) or even to a more
powerful Differential Power or Electromagnetic Analysis (DPEMA). The aim
of the attack is to recover the internal S with which the cold-boot attack can
succeed afterwards.
8
In the most sophisticated version the attack would behave like this. Assume210
that the victim cache has sensitive data in it, scrambled with vector S1. Imme-
diately, the adversary attacks the victim with an SPEMA or DPEMA, estimates
the scrambling vector S∗1 and carries out the cold-boot attack by abruptly dis-
connecting the supply and booting the system with an ad-hoc operating system.
Promptly after, the adversary reads out the sensitive data from the victim cache215
and stores them in a backup memory. However, if the scrambling technique is
like IST [21] after the cold boot attack the scrambling vector will change to a
new one, e.g. S2 such that the stolen data recovered by the adversary will be
now SD = D⊕S1⊕S2, in which D is the sensitive data wanted. Therefore, the
adversary will have to perform a second SPEMA or DPEMA attack to estimate220
S∗2 . In the final step the double scrambling is undone by the adversary on the
stolen data (SD = D ⊕ S1 ⊕ S2) ⊕ S∗1 ⊕ S∗2 → D and the sensitive data is
recovered provided that the estimations of S∗1 and S
∗
2 are correct.
In a conference paper [23] authors presented how an SPEMA attack could
be carried out in a cache memory protected by a scrambling technique like IST225
and proposed a countermeasure against this consisting in to decrease the prob-
ability of correctly estimating S∗. In this paper this SPEMA countermeasure
is extended to fight against DPEMA attacks too, so that the complete solution
becomes resilient against both types of side channel attacks. The rest of the
paper is focused on the DPEMA attacks, however the solution proposed and230
the results presented will include SPEMA attacks too.
3.1. Differential P or EM Radiation Analysis Attack
DPEMA is an even more powerful attack than SPEMA despite requiring a
longer time to be completed. In the context of cache memory, this attack does
not discover the whole scrambling vector at once but estimates it on a bit by235
bit basis [22].
Let us assume that the adversary attempts to estimate bit sj of the scram-
bling vector. He first builds two subsets of profiling vectors (data vectors):
subset D(dj = 0) contains all combinations of values at which data bit located
at the same position, dj , is constant at 0 and subset D(dj = 1), which is similar240
but now bit dj is constant at 1. Then he applies the first subset D(dj = 0) in a
continuous loop and estimates the average hamming weight HWavg(dj = 0) of
the scrambled data. He repeats the same action again with the second subset
D(dj = 1) and estimates the average hamming weight HWavg(dj = 1). Finally,
he guesses bit s∗j of S
∗ as follows:245
s∗j =
{
0, if HWavg(dj = 0) < HWavg(dj = 1)
1, if HWavg(dj = 0) > HWavg(dj = 1)
(1)
This operation is repeated for each bit and once all bits are estimated, the
scrambling vector is built as the concatenation of:
S∗ = (s∗n−1, s
∗
n−2, . . . , s
∗
1, s
∗
0) (2)
9
The strength of this attack lies in the fact that it is not necessary to apply
all combinations of the rest of bits which are not kept constant. It is even better
to change them randomly, thus reducing the amount of vectors significantly and250
obtaining a tighter estimate of the average hamming weight with very low noise.
Furthermore, the hamming weight is not really necessary since the comparison
in Eq. (1) can be made directly with the physical magnitudes PDDavg(dj = 0)
and PDDavg(dj = 1) or EMDDavg(dj = 0) and EMDDavg(dj = 1) measured by
the external instruments. One of the most dangerous points is that the physical255
magnitude can be averaged over thousands of samples, which would greatly
reduce the effect of (accidental or intentional) noise and would strengthen the
signal induced by the sought information.
3.2. Attack model
For the rest of the paper it is assumed that the adversary can measure260
current consumption using sensors attached to the power supply or measure
electromagnetic power radiation by means of antenna probes. These can detect
internal activity in the memory bus. He knows the model of the L2 cache and
understands the IST process/operation. Moreover, he has control over some
data vectors generated by the CPU, which allow him to estimate the scrambling265
vector in use. He cannot read cache memory sensitive data directly from the
CPU, since it is assumed that the operating system blocks protected memory
addresses when the critical program is in operation, therefore to do so the system
must undergo a cold-boot attack such as those described above. The adversary
cannot read cache content from the outside of the CPU because the former270
cannot be detached from the latter.
4. Proposed solution
This section presents the solution to protect the cache memory against cold-
boot attacks boosted with DPEMA. The idea is to reduce the amount of in-
formation leaking from the system and to modify the correlations such that275
confusion prevents proper operation of the attack model, that is, to prevent the
analysis with DPEMA from revealing the true scrambling vector. Since this
solution is build over a previous countermeasure against SPEMA, proposed by
the same authors in a conference paper [23], first this scheme is introduced and
later the DPEMA countermeasure will be explained.280
4.1. ISTe, SPEMA countermeasure
In [23] the model of the cache assumed includes an error detector and cor-
rection scheme [24] in which the IST is added and is the base of the SPEMA
countermeasure. The general overview of the cache design is illustrated in Fig.
7.285
The components of this model are the following: CPU which provides data
(D3), full adder blocks (
∑
) that calculate carry and sum bits, scrambling vector
(S5) and XOR gates. The operation of this model is explained below. Symbol
10
CPU D n 3
3
 
is
ic
2 1 0( )
id d d
3D
5D
 
1is 
1ic 
1
2 1 0( )
id d d 
2 1 0( )
is s s ics
i
ss... ...
2 1 0( )
isd sd sd
isc
iss
Scrambling vector  
5S
5SD
to cache 
memory 
3
3
1
2 1 0( )
isd sd sd 
1isc 
1iss 
Figure 7: Model of the cache memory with error detection and correction scheme in which
IST is applied (ISTe).
ISTe will distinguish this case from that in which the cache does not have an
error detection and correction scheme.290
In this particular scheme, data vector bits (D3) are separated in groups of
three bits to which two bits are appended corresponding to the sum and carry
bits. Therefore, each data vector D3 = {. . . , (d2, d1, d0)i, . . .} is first appended
with the error detection and correction bits and transformed to the new vec-
tor D5 = {. . . , (d2, d1, d0)i, ci, si, . . .}. Then the scrambling transformation is
applied as in any regular scrambling technique, cf. Fig. 7. In correspondence,
scrambling vector provides redundant bits S5 = {. . . , (s2, s1, s0)i, sic, sis, . . .} to
protect data redundancy. These bits sic, s
i
s are not randomly generated but
calculated according to the following equations [23],
sc = not(carry(s2, s1, s0)) = s2s1 + s2s0 + s1s0
ss = not(sum(s2, s1, s0)) = s2 ⊕ s1 ⊕ s0
(3)
while bits s2, s1, s0 are generated randomly. Contrary to the intuition, gen-
erating sic and s
i
s randomly would compromise the security. This last fact is
illustrated in the results Section 5.
The SPEMA countermeasure shown in Fig. 7 is insecure against DPEMA
attacks as it is illustrated in the following example.295
4.2. Example of DPEMA attack on ISTe
For simplicity and without loss of generality, consider a bus of three bits, i.e.
the three data bits plus two redundant bits as illustrated in Fig. 7. First the
adversary prepares data vector sets for the attack, the zero vector set D5(dj = 0)
and the one vector set D5(dj = 1) as shown in Tab. 1. It includes the sets for300
each one of the data bits.
The attack consists in estimating the average hamming weights for each
one of these sets. The adversary activates the system and, for example, starts
by applying the zero set D5(d2 = 0) in a continuous loop such that he excites
11
Table 1: Zero and one data vector sets for the DPEMA attack in a three bit bus example.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 0 1 0 1 0 1 0 0 1
0 1 0 0 1 1 0 0 0 1 1 0 0 0 1
0 1 1 1 0 1 0 1 1 0 1 1 0 1 0
1 0 0 0 1 0 1 0 0 1 0 0 1 0 1
1 0 1 1 0 0 1 1 1 0 0 1 1 1 0
1 1 0 1 0 1 1 0 1 0 1 0 1 1 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2d 1d 0d c s
2d 1d 0d c s
5
2( 0)D d 
5
2( 1)D d 
2d 1d 0d c s
2d 1d 0d c s
5
1( 0)D d 
5
1( 1)D d 
2d 1d 0d c s
2d 1d 0d c s
5
0( 0)D d 
5
0( 1)D d 
current consumption and EM radiation power at a stable rate. He calculates the
average of these physical magnitudes and then estimates the average hamming
weight HWavg(d2 = 0). By repeating the same process with the one set, he
obtains HWavg(d2 = 1). Finally, with Eq. 1 he guesses bit s2 of the scrambling
vector as
s∗2 =

0, ifHWavg(d2 = 0) < HWavg(d2 = 1)
?, ifHWavg(d2 = 0) = HWavg(d2 = 1)
1, ifHWavg(d2 = 0) > HWavg(d2 = 1)
Following the same methodology bits s1 and s0 are guessed too, it is unnecessary
to do predictions for bits sc and ss because they are only part of the error
correction scheme.
In Tab. 2 the complete calculations are shown. It is divided in four sub-305
tables: (I) has the matrix of hamming weights generated in the data bus after
scrambling the data vector D5 (shown at the left side of the matrix) with all
possible scrambling vectors S5 (at the top side of the matrix), (II) has the
average hamming weights HWavg(d2 = 0) and HWavg(d2 = 1) generated after
running the sets D5(d2 = 0) and D
5(d2 = 1) respectively. From this averages310
the estimation of bit s2 is made and shown at the botton of this sub-table.
Sub-tables (III) and (IV) show the same information but for bits 1 and 0 of the
data and scrambling vectors giving the estimations of s1 and s0 respectively.
Let’s focus on one particular case. Assume that the internal scrambling
vector is S5 = (101 01), cf. label (A) in the Table. When the CPU sends data315
vectors D5 = {000 00, 001 01, ..., 111 11} to the scrambling circuit, the hamming
weights HW = {3, 1, 3, 4, 1, 2, 4, 2} are excited in the data bus, see column under
label (A). Based on this fact, the adversary starts exciting the scrambling circuit
with the zero data vector set D5(d2 = 0) that will excite in the data bus only
12
Table 2: Example of DPEMA attack applied to a three bit data bus with ISTe scrambling
technique. (I) is the matrix of hamming weights generated in the data bus after the scrambling.
(II), (III) and (IV) are the average hamming weights used to estimate bit s2, s1 and s0
respectively.
2 2 2 3 2 3 3 3
2 2 4 1 4 1 3 3
2 4 2 1 4 3 1 3
3 1 1 2 3 4 4 2
2 4 4 3 2 1 1 3
3 1 3 4 1 2 4 2
3 3 1 4 1 4 2 2
3 3 3 2 3 2 2 2
2.3 2.3 2.3 1.8 3.3 2.8 2.8 2.8
2.8 2.8 2.8 3.3 1.8 2.3 2.3 2.3
Estimated s 2
2.3 2.3 3.3 2.8 2.3 1.8 2.8 2.8
2.8 2.8 1.8 2.3 2.8 3.3 2.3 2.3
Estimated s 1
2.3 3.3 2.3 2.8 2.3 2.8 1.8 2.8
2.8 1.8 2.8 2.3 2.8 2.3 3.3 2.3
Estimated s 0
2( 0)avgHW d 
5D
5S
2( 1)avgHW d 
1( 0)avgHW d 
1( 1)avgHW d 
0( 0)avgHW d 
0( 1)avgHW d 
A 
B 
C D 
I 
II 
III 
IV 
13
CPU D n 3
3
 isic
2 1 0( )
id d d
3D 5D
 1is 1ic 
1
2 1 0( )
id d d 
2 1 0( )
is s s ics
i
ss 
2 1 0( )
isd sd sd
isc
iss
Scrambling vector 5S
5SD
to cache
memory
3
3
Random bit 
generator
r
5
rSD
1
2 1 0( )
isd sd sd 
1isc 
1iss 
1
1
1
1
1
Figure 8: Overview of the RM-ISTe technique.
the hamming weights {3, 1, 3, 4}. After averaging them the adversary obtains320
the average hamming weight, HWavg(d2 = 0) = 2.8, cf. label (B). Next, he
excites the scrambling circuit with the one data vector set D5(d2 = 1) that
will excite in the data bus only the hamming weights {1, 2, 4, 2} and obtains
the new average hamming weight HWavg(d2 = 1) = 2.3 cf. label (C). Finally,
since HWavg(d2 = 0) > HWavg(d2 = 1), bit s2 is correctly estimated as s
∗
2 = 1,325
according to eq. 1, cf. label (D).
4.3. DPEMA countermeasure
Random masking ISTe (RM-ISTe) is a strategy for balancing average ham-
ming weights HWavg(dj = 0) and HWavg(dj = 1) to make them equal or
randomly unequal in order to render Eq. (1) useless. The circuit schematic330
of this countermeasure is shown in Fig. 8 in which the red part shows the
modifications to include this random masking.
A random bit generator providing bit r is added to the previous design. This
bit changes randomly after each CPU write cycle. For r = 0, the scrambling
vector is taken as it is whereas for r = 1 the scrambling vector is inverted before335
scrambling the data vector. Once the scrambled data is stored in the cache, bit
r is appended to it. Therefore, the scrambled data vector which is sent to the
cache memory is SD5r = (SD
5, r), where bit r equally affects all scrambled data
bits.
To better understand how the countermeasure works, Fig. 9 presents the340
flowchart of the write cycle including an example on 9 bits. When a write
cycle begins, the CPU generates the data vector D3 that needs to be stored in
the L2 cache. At the same time, a scrambling vector S3 is retrieved from the
scrambling table. The redundancy is calculated for the data vector (for every 3
bits of data, a carry and sum bit are appended) and D5 is generated. Similarly345
14
METHODOLOGY EXAMPLE
XOR vectors D5 and S5
generate scrambled data SD5
r = 1?
Retrieve scrambling 
vector (S3) from 
scrambling table
Invert S5
Store SD5 and
r in L2 cache
Append carry and sum 
bits for every 3 bit subset 
of D3 → generate D5
Append 2 redundant bits 
for every 3 bit subset of 
S3 → generate S5
Generate the 
random masking 
(r) bit
CPU generates data 
vector (D3)
Write cycle begins
N
Y
Write cycle begins
Scrambling vector
010 011 110
Data vector
011 100 101
Append carry and sum
011 10 100 01 101 10
Append redundancy
010 10 011 01 110 01
Generate r bit
1
r = 1?
XOR vectors D5 and S5
001 00 111 00 011 11
Store SD5 and r
001 00 111 00 011 11 1
Y
101 01 100 10 001 10 
Figure 9: Architecture flowchart of the write cycle in the RM-ISTe technique. An example on
9 bits is included.
the redundancy for S3 is computed (for every 3 bits, the 4th bit is the inverse
of the carry, while the 5th bit is the inverse of the sum) and S5 is generated.
Then, the random masking bit r is obtained from the random generator and its
value is checked. If it is 1, S5 is inverted and XORed with D5 generating the
scrambled data SD5 otherwise the inversion os S5 is not performed. Finally,350
SD5 is stored in the cache, together with the random masking bit r.
4.4. Example of DPEMA attack on RM-ISTe
Tab. 3 presents the same DPEMA attack example as in Tab. 2 but for the
RM-ISTe technique. It exemplifies how the countermeasure works and as before
it is made on a three bit data bus architecture. Notice that, since the scrambling355
technique is bit-wise, the conclusions of this example can be extended to any
data bus size.
The Table is divided in the same four sub-tables (I)–(IV) as before. Sub-
table (I) contains the matrix of the hamming weights generated in the data
bus by the scrambling operations and sub-tables (II) to (IV) have the average360
hamming weights used by the adversary to estimated the scrambling vector
bits. Unlike in the previous example (Tab. 2) now the top row of matrix (I) has
not only all possible scrambling vectors S5 for three bits but their inversions
S5. This is so because each time the CPU writes, the data vector D5 can be
randomly scrambled with S5 or S5. A second difference in matrix (I) is that365
the hamming weights are printed in three columns (HW / t2, HW / t1, HW
15
Table 3: DPEMA attack applied to RM-ISTe countermeasure.
HW 
t2
HW 
t1
HW 
t0 111 00
HW 
t2
HW 
t1
HW 
t0 110 01
HW 
t2
HW 
t1
HW 
t0 101 01
HW 
t2
HW 
t1
HW 
t0 100 10
2 3 2 3 3 2 3 2 2 3 2 2 3 3 3 3 2 3 3 2
2 3 2 3 3 2 3 3 3 3 4 4 1 1 1 1 4 1 1 4
2 2 3 3 3 4 4 1 1 1 2 2 3 2 3 1 1 1 1 4
3 2 3 2 2 1 1 1 1 4 1 1 4 1 4 2 3 3 2 3
2 2 2 3 3 4 4 4 1 1 4 1 4 1 1 3 3 3 2 2
3 3 2 3 2 1 1 4 4 4 3 2 3 3 2 4 1 4 4 1
3 2 3 2 2 3 3 2 2 2 1 1 4 1 4 4 1 1 1 1
3 3 3 2 2 3 3 2 3 2 3 2 3 2 2 2 2 3 3 3
2.5 2.8 2.3 2.5
2.5 2.8 1.5 1.8
Estimated s 2
2.0 3.3 2.8 2.8
3.0 1.5 3.5 2.0
Estimated s 1
2.8 1.5 1.8 1.8
2.5 2.8 1.8 2.5
Estimated s 0
Randomly 
inverted 
Randomly 
inverted 
Randomly 
inverted 
2( 0)avgHW d 
2( 1)avgHW d 
1( 0)avgHW d 
1( 1)avgHW d 
0( 0)avgHW d 
0( 1)avgHW d 
5D
Randomly 
inverted 
I 
II 
III 
IV 
HW 
t2
HW 
t1
HW 
t0 011 10
HW 
t2
HW 
t1
HW 
t0 010 10
HW 
t2
HW 
t1
HW 
t0 001 10
HW 
t2
HW 
t1
HW 
t0 000 11
2 3 2 3 3 3 2 2 3 2 3 2 2 2 2 3 2 3 2 2
4 4 1 4 1 1 1 4 1 4 3 3 3 3 2 3 3 3 3 2
4 1 1 1 1 3 3 3 2 2 1 4 4 4 4 3 2 2 3 2
3 3 3 2 2 4 4 1 4 1 4 1 4 1 1 2 2 3 3 3
2 2 2 3 3 1 1 1 4 4 1 1 1 1 4 3 2 3 3 2
1 4 4 1 4 2 3 3 2 3 4 1 4 1 1 2 3 2 3 3
1 4 4 1 4 4 4 4 4 1 2 3 2 2 3 2 2 3 3 3
3 2 3 2 2 2 3 2 2 3 2 2 2 3 3 2 3 3 2 3
2.8 2.5 2.5 2.3
3.0 2.8 1.8 2.5
Estimated s 2
1.8 2.5 2.5 2.8
3.3 2.5 3.0 2.8
Estimated s 1
2.0 3.3 2.3 2.8
2.3 2.3 2.0 2.8
Estimated s 0
Randomly 
inverted 
Randomly 
inverted 
Randomly 
inverted 
Randomly 
inverted 
2( 0)avgHW d 
2( 1)avgHW d 
1( 0)avgHW d 
1( 1)avgHW d 
0( 0)avgHW d 
0( 1)avgHW d 
5D
A B 
C 
D 
E 
F 
G 
H 
I 
I 
II 
III 
IV 
16
/ t0) for each scrambling vector S5. Each of these columns indicate a random
selection of the scrambling vector S5 or S5 so at three different time instants
(t2, t1 and t0) the hamming weight obtained after the scrambling operation can
be different. The background color of each number indicates what scrambling370
vector has been used (green S5) or (pink S5).
As before, we will examine the case for the scrambling vector S5 = (101 01),
cf. label (A) whose inverted value is S5 = (010 10), cf. label (B). Suppose that
the adversary starts, at time instant t2, attacking bit s2 and for this purpose
he first excites once the scrambling circuit with the zero set D5(d2 = 0). He375
captures the hamming weights {2, 1, 3, 4}, cf. label (C), and obtains the average
hamming weight HWavg(d2 = 0) = 2.5, where three out of four values corre-
spond to the use of S5 and one to the use of S5. Then, he applies the one set
D5(d2 = 1) and captures the new values for the hamming weights {1, 3, 4, 3}, cf.
label (D), and estimates the average hamming weight HWavg(d2 = 1) = 2.8.380
In this second estimation, two out of four hamming weights are generated by S5
while the other two are generated by S5. He finally applies Eq. 1 and estimates
s∗2 = 0, cf. label (E), which becomes wrong because s2 = 1.
Next, the adversary repeats the procedure to obtain bit s1, but now when
he applies the zero and one sets at time instant t1, he will obtain different385
selections for the S5 and S5 scrambling vectors, cf. labels (F) and (G). After
calculating the averages, he will obtain two equal values, HWavg(d1 = 0) =
HWavg(d1 = 1) = 2.5, and consequently it will not be possible to reliably
estimate the corresponding scrambling vector bit s∗1 =?, cf. label (H). Finally,
by repeating the process at the time step t0 for the last bit, the estimation390
obtained is s∗0 = 1, which in this case is correct by chance, cf. label (I).
5. Evaluation and results
The proposed technique RM-ISTe is evaluated on a virtual implementation
of the scrambled cache memory where several switches configure different kinds
of countermeasures. The experiments are compared to the previous techniques395
IST and ISTe [21, 23].
The evaluation of the security is done by calculating the information that
leaks through the hamming weight. The leakage function L(s) which is based on
information entropy [25], evaluates how much close are the estimated scrambling
bits s∗j from the real ones sj . This function is bounded in the interval 0 ≤ L(s) ≤400
1 in which 0 means an always wrong estimation while 1 corresponds to always
correct estimation. The derivation of this function can be found in Appendix
A.
An ad hoc simulation environment is programmed in C++ which includes
the control of the data bus, the scrambling circuit and the cache memory. A405
wrapper emulates the behavior of the attack in the virtual environment and
virtual instruments monitor the performance of the different countermeasures.
The wrapper also gives us full control over the generation of scrambling vectors
and observation of scrambled data. This environment runs on a Supermicro
17
workstation with the following specifications: 64 AMD cores of 64 bits, 256 GB410
of memory, 6 TB of disc space and with CentOS Linux operating system.
The virtual environment allows the modification of architecture size. Au-
thors have full control over the data bus through which bursts of data vectors
can be repeatedly sent to carry out attacks. An internal monitor measures the
hamming weight generated during data transfer to the cache, including trans-415
formations made by the scrambling circuit. Monitoring the hamming weight
as a direct metric for the attack is a conservative way to evaluate the strength
of countermeasures because it is an upper bound of the physical measures that
the adversary would achieve by the observation of power consumption or EM
radiation intensity, as has been explained in previous Section 3. Furthermore,420
periodical refreshes of scrambling vectors that are made by IST and the derived
techniques in order to limit the effectiveness of attacks is disabled. In these
experiments the same scrambling vector is kept in use throughout the attack
because we are more interested in evaluation of the random masking scheme
strength than in the whole scheme effectiveness. Therefore results must be425
understood as an upper bound of the attack success.
The emulated configurations are:
• S3 - IST scrambling technique [21].
• S5R5 - ISTe′′ SPEMA countermeasure, but where the two redundant bits
of the scrambling vector sic, s
i
s are generated randomly. This configuration430
is presented to illustrate the comment made in Subsection 4.1 that full
random generation of the redundancy decreases the security instead of
increasing it.
• S5R4C1 - ISTe′ SPEMA countermeasure, but where bit sic is generated
randomly while sis is calculated according Eq. 3. This configuration is a435
middle step between S5R5 and S5R3C2.
• S5R3C2 - ISTe SPEMA countermeasure [23].
• S3M - same as S3 but with random masking RM-IST. No figure illustrates
explicitly this configuration but consider Fig. 8, where bits {. . . , ci, si, . . .}
are not added to the data vector and bits {. . . , sic, sis, . . .} are not generated440
for the scrambling vector. However, a random bit generator flips the
content of the scrambling vector randomly and bit r is stored in the cache
together with scrambled data SD.
• S5R5M - Same as S5R5 but with random masking RM-ISTe′′, Fig. 8.
• S5R4C1M - Same as S5R4C1 but with random masking RM-ISTe′, Fig.445
8.
• S5R3C2M - Same as S5R3C2 but with random masking, Fig. 8. This is
the full RM-ISTe proposed in this paper for defense against SPEMA and
DPEMA attacks which renders the minimum leakage.
18
0.14
0.25
0.45
0.71
0.92
1.00 1.00
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
20 40 80 160 320 640 1000
L
Data vector set size
63 bit
S3
S5R5
S5R4C1
S5R3C2
Figure 10: Information leakage achieved by DPEMA attacks applied on techniques without
random masking for a 63-bit architecture.
5.1. DPEMA attacks in non-random-masking techniques450
DPEMA attack results are first presented for an architecture of 63 bits. Data
vector sets D3/5(di = 0/1) used during the attacks are of lengths from 20 to
1000 vectors. The attacks are repeated for 1000 different scrambling vectors and
the leakage function L(s) is averaged between them. Fig. 10 plots the results
for the techniques without random masking, {S3, S5R5, S5R4C1, S5R3C2}.455
The x-axis shows the number of vectors in the attacking data set and the
y-axis gives the values of the leakage metric L(s). The low effectiveness of the
techniques without random masking against DPEMA is worth noting. The left
column of each group shows that leakage L(s) increases from 0.06 to 0.81 for
the most effective technique in this group, S5R3C2, when the data vector set is460
changed from 20 to 1000 vectors. For the leakage level of 0.81 the number of
correctly estimated scrambling vectors is 131 over 1000. The other techniques
exhibit a similar trend but with higher leakage levels, giving in the worst case
1 which means that all the estimations were correct. In particular the worse
behavior of S5R4C1 and S5R5 is caused by the random generation of the redun-465
dancy in the scrambling vectors. The case S3 without SPEMA countermeasure
is the one that also presents the poorest results under DPEMA attacks.
Another significant trend observed is that the leakage increases when the
attacking data vector set enlarge, independently of the technique used. Notice
that for a set of 20 the leakages of the four techniques are {0.06, 0.08, 0.08, 0.14}470
while for a set of 1000 the leakages are {0.81, 0.82, 0.82, 1}. This is an important
information for the adversary because he knows that for a large enough set he
can break the system completely independently of the internal countermeasure.
19
00.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
20 40 80 160 320 640 1000
L
Data vector set size
63 bit
S3M
S5R5M
S5R4C1M
S5R3C2M
(RM-ISTe)
Figure 11: Information leakage achieved with DPEMA attacks applied on random masking
techniques for 63-bit architecture size.
5.2. DPEMA attacks in random-masking techniques
Experiments of Fig. 10 are repeated but with the techniques using random475
masking {S3M, S5R5M, S5R4C1M, S5R3C2M}, results are plotted in Fig. 11.
For the attacking set of 1000 vectors, a significant decrease of the leakage
from 0.81 to 0.016 is observed for S5R3C2M (the proposed RM-ISTe). It is
remarkable that, among all predictions none of the scrambling vectors are cor-
rect. It is also worth noting that the leakage is approximately the same for all480
data vector set sizes without following any expected trend. This is particularly
interesting because one of the procedures that an adversary can use to drive the
attack is to predict the variation of the leakage vs. the number of data vectors
applied. In the techniques using random masking, the above prediction does not
provide any useful feedback. Finally, the combination of techniques designed485
against SPEMA attacks in [23] provide an additional degree of protection when
used with random masking. Compare for the case of 1000 data vectors the case
wihout SPEMA countermeasure (S3M) to the case with SPEMA countermea-
sure (S5R3C2M), which give leakages of 0.026 and 0.016 respectively.
5.3. DPEMA attacks for different architecture sizes490
Tab. 4 summarizes the results for several architecture sizes going from 9 to
63 bits, for attacks carried out with data vector sets of 1000 vectors. All the
numbers are averaged for 1000 attacks using different scrambling vectors.
In the table, first column (Arch. size) indicates the size of the bus. Second
column (S3 / IST) lists the leakage for all architectures using the S3 technique,495
which gives a leakage of 1 in all cases. The next three columns (S5R5 / ISTe′′,
S5R4C1 / ISTe′, S5R3C2 / ISTe) correspond to the techniques without random
masking but including a SPEMA countermeasure. In this group, a maximum
leakage of 1 is found for all techniques in the smallest architectures while mini-
mum leakages are 0.815, 0.815 and 0.813 respectively for the largest architecture,500
20
Table 4: Information leakage measured for DPEMA attacks in different architecture sizes and
techniques. Attacking data vector set is 1000. Number of scrambling vectors tested is 1000.
In grey background cells at least one scrambling vector has been correctly estimated.
Arch.
size
S3
IST
S5R5 
ISTe''
S5R4C1 
ISTe'
S5R3C2 
ISTe
S3M
RM-IST
S5R5M
RM-ISTe''
S5R4C1M
RM-ISTe'
S5R3C2M
RM-ISTe
9 1 1.000 0.0% 1.000 0.0% 1.000 0.0% 0.198 80.2% 0.142 85.8% 0.142 85.8% 0.110 89.0%
12 1 0.999 0.1% 0.999 0.1% 1.000 0.0% 0.147 85.3% 0.108 89.2% 0.108 89.2% 0.088 91.2%
15 1 0.997 0.3% 0.997 0.3% 0.999 0.1% 0.121 87.9% 0.087 91.3% 0.087 91.3% 0.065 93.5%
18 1 0.993 0.7% 0.993 0.7% 0.997 0.3% 0.090 91.0% 0.070 93.0% 0.070 93.0% 0.054 94.6%
21 1 0.990 1.0% 0.990 1.0% 0.993 0.7% 0.080 92.0% 0.057 94.3% 0.057 94.3% 0.050 95.0%
24 1 0.984 1.6% 0.984 1.6% 0.986 1.4% 0.064 93.6% 0.052 94.8% 0.052 94.8% 0.046 95.4%
27 1 0.973 2.7% 0.973 2.7% 0.981 1.9% 0.055 94.5% 0.047 95.3% 0.047 95.3% 0.036 96.4%
30 1 0.963 3.7% 0.963 3.7% 0.974 2.6% 0.059 94.1% 0.043 95.7% 0.043 95.7% 0.037 96.3%
33 1 0.952 4.8% 0.952 4.8% 0.957 4.3% 0.050 95.0% 0.039 96.1% 0.039 96.1% 0.033 96.7%
36 1 0.930 7.0% 0.930 7.0% 0.945 5.5% 0.048 95.2% 0.034 96.6% 0.034 96.6% 0.033 96.7%
39 1 0.928 7.2% 0.928 7.2% 0.928 7.2% 0.045 95.5% 0.033 96.7% 0.033 96.7% 0.029 97.1%
42 1 0.914 8.6% 0.914 8.6% 0.921 7.9% 0.039 96.1% 0.031 96.9% 0.031 96.9% 0.025 97.5%
45 1 0.895 10.5% 0.895 10.5% 0.903 9.7% 0.037 96.3% 0.027 97.3% 0.027 97.3% 0.025 97.5%
48 1 0.883 11.7% 0.883 11.7% 0.889 11.1% 0.035 96.5% 0.027 97.3% 0.027 97.3% 0.023 97.7%
51 1 0.872 12.8% 0.872 12.8% 0.883 11.7% 0.033 96.7% 0.024 97.6% 0.024 97.6% 0.021 97.9%
54 1 0.850 15.0% 0.850 15.0% 0.859 14.1% 0.032 96.8% 0.024 97.6% 0.024 97.6% 0.021 97.9%
57 1 0.838 16.2% 0.838 16.2% 0.845 15.5% 0.029 97.1% 0.023 97.7% 0.023 97.7% 0.021 97.9%
60 1 0.828 17.2% 0.828 17.2% 0.821 17.9% 0.028 97.2% 0.023 97.7% 0.023 97.7% 0.018 98.2%
63 1 0.815 18.5% 0.815 18.5% 0.813 18.7% 0.026 97.4% 0.021 97.9% 0.021 97.9% 0.018 98.2%
representing a reduction of 18.7%. In all the cases at least one scrambling vec-
tor has been estimated correctly, this is shown painting the background in grey
shadow.
The following four columns (S3M / RM-IST, S5R5M / RM-ISTe′′, S5R4C1M
/ RM-ISTe′, S5R3C2M / RM-ISTe) correspond to the techniques with random505
masking and therefore all of them are DPEMA countermeasures. Except the
first, the rest also include a SPEMA countermeasure. The maximum leakage is
found for the smallest architecture size with 0.198 for (S3M / RM-IST), repre-
senting a reduction of 80.2% in leakage while the minimum leakage is found in
(S5R3C2M / RM-ISTe) for the biggest architecture size with 0.018 that repre-510
sents a reduction of 98.2%. In all four cases none of the scrambling vectors are
correctly estimated for architecture sizes larger than 18 bits, this is shown with
the white background of the cells.
5.4. Implementation costs
To evaluate the implementation costs we have followed the same strategy515
as in [26] and [27] which consists in predicting them with the CACTI tool [28].
This tool generates cost predictions for cache memory architectures that can be
tuned for several parameters including line size, associativity, number of banks,
technology nodes, etc. It allows to do space exploration of different alternatives
during the design phase.520
We consider the following methodology. The particular logic implementa-
tion whose costs needs to be evaluated is split in sub-blocks whose architecture
needs to resemble as close as possible to a cache memory. Then, different cache
memories are dimensioned according to the sub-block parameters and technol-
ogy and their cost predictions obtained with the CACTI. Finally an artifact is525
created to combine the predictions of the sub-blocks following the rules of the
global design from which the final cost predictions are generated.
21
Table 5: Implementation costs of the previous and current proposed techniques.
L2 cache        16.00      0.1479       94.50     0.3651   
  └IST          4.00   20%    0.0684   32%     74.50   44%   0.3227   47%
  └ISTe        17.20   52%    0.1573   52%     96.80   51%   0.3701   50%
  └RM-IST          4.50   22%    0.0706   32%     75.32   44%   0.3274   47%
  └RM-ISTe        17.70   53%    0.1602   52%     97.82   51%   0.3727   51%
L2 cache        32.00      0.2394       97.30     0.4019   
  └IST          5.65   15%    0.0486   17%     50.50   34%   0.3113   44%
  └ISTe        30.50   49%    0.2352   50%     96.67   50%   0.3973   50%
  └RM-IST          6.65   17%    0.0885   27%     83.96   46%   0.3347   45%
  └RM-ISTe        31.50   50%    0.2401   50%     97.90   50%   0.3995   50%
L2 cache        64.00      0.5493     159.40     0.5015   
  └IST          8.00   11%    0.0691   11%     57.90   27%   0.3128   38%
  └ISTe        55.52   46%    0.5023   48%   149.70   48%   0.4853   49%
  └RM-IST        10.00   14%    0.1148   17%     81.93   34%   0.3520   41%
  └RM-ISTe        57.52   47%    0.5136   48%   151.97   49%   0.4889   49%
L2 cache      128.00      1.1452     280.20     0.6238   
  └IST        11.31   8%    0.1222   10%     84.40   23%   0.3589   37%
  └ISTe      103.25   45%    0.7692   40%   202.20   42%   0.5759   48%
  └RM-IST        15.31   11%    0.1455   11%     92.49   25%   0.3605   37%
  └RM-ISTe      107.25   46%    0.7917   41%   206.47   42%   0.5839   48%
Size (KB) Area occupied (mm2)
Power  consumption 
(mW) Access time (ns)
In our particular case all the scrambling techniques that we present are based
on the IST (Interleave Scrambling Technique) architecture [21]. This technology
consists of two main blocks, the L2 cache memory itself and the scrambling530
vector table which contains sets of scrambling vectors that are selected according
to certain replacement rules and additional auxiliary registers and flags. The
other three main sub-blocks are the redundancy generator and checking code
modules, the scrambling circuit and the random generator. The extraction of
parameters and artifacts for the cost estimation are as follows:535
• L2 cache memory – In all IST versions each cache line needs one extra
flags. In RM-IST an additional flag is included per word. In ISTe words
are extended with data redundancy.
• Scrambling vector table – In IST the size of this table grows as the square
root of the L2 cache memory size according to the rules of [21]. In the540
ISTe cases the scrambling vectors (two per line) of the table are extended
with the redundancy. With respect to the effect of the delay (access time)
of the cache emulating this table, it is added to the L2 cache memory
time as a worst case scenario. With respect to the area and power they
are added too.545
• Redundancy generator and checking code modules, and scrambling circuit
– They grow linearly with the size of the data bus so it is assumed that
22
the memory cache ports (L2 cache and scrambling vector table) properly
emulate the cost overhead of these three elements too.
• Random generator – we do not consider the cost of the random generator550
because it is constant and independent of the architecture size. If imple-
mented as a pseudo-random generator its impact is negligible with respect
to the other sub-blocks.
Results are presented in Tab. 5. Column (Size (KB)) contains the capacity
of the base L2 cache memory and the equivalent size extension of the cache555
necessary to implement each one of the scrambling techniques. At the right side
of each number the increment in percentage is shown with respect to the base
L2 cache memory. Four base L2 cache memory sizes have been considered: 16,
32, 64 and 128 KB and in each one of them the four scrambling techniques are
implemented {IST, ISTe, RM-IST, RM-ISTe}.560
In the rest of columns three costs are shown: (Area occupied), (Power con-
sumption) and (Access time). All of them are obtained for a technology node
of 45 nm. It is remarkable to see that the overheads decrease for larger cache
memory sizes, which is caused by the slower increase of the scrambling vector
table as indicated above. In particular, for a 128 KB L2 cache size and the565
proposed technique (RM-ISTe) the costs are: 41% area overhead, 42% power
overhead and 48% access time overhead.
6. Conclusions
In this paper cold-boot attacks on cache memories boosted by differential
power and electromagnetic analysis (DPEMA) are considered. While it is known570
that scrambling techniques, like the Interleaved Scrambling Technique (IST) can
be effective against cold-boot attacks it is demonstrated that a DPEMA can be
used to discover the internal scrambling vector and consequently to make the
cold-boot attack effective, thus breaking the security of IST.
In this paper a new strategy is presented that can be added to the IST mak-575
ing this robust against DPEMA. It is named random masking (RM) and a com-
plete solution is presented RM-ISTe which becomes effective against cold-boot
attacks boosted by SPEMA (static) and DPEMA (dynamic) analysis aiming to
discover the internal scrambling vector. Several examples illustrate the oper-
ation of the methodology proposed and experiments are presented to evaluate580
its effectiveness for different architecture sizes. It is seen that the leakage ema-
nating from the power or electromagnetic radiation is reduced to a 98.2% with
respect to the plain IST technique. The cost of the implementation considering
a technology node of 45 nm, for a cache memory size of 128 KB is: 41% for the
area overhead, 42% for the power consumption overhead and 48% for the access585
time overhead, respectively.
References
[1] M. J.P., Payments fraud and control survey (2015).
23
[2] F. Paget, Financial fraud and internet banking: threats and countermea-
sures (2009).590
[3] D. R. Piegdon, L. Pimenidis, Hacking in physically addressable memory,
in: Seminar of Advanced Exploitation Techniques, WS 2006/2007, Vol. 12,
2007.
[4] K. Harrison, S. Xu, Protecting cryptographic keys from memory disclosure
attacks, in: 37th Annual IEEE/IFIP International Conference on Depend-595
able Systems and Networks (DSN’07), IEEE, 2007, pp. 137–143.
[5] S. Skorobogatov, Low temperature data remanence in static ram (2002).
[6] J. A. Halderman, S. D. Schoen, N. Heninger, W. Clarkson, W. Paul, J. A.
Calandrino, A. J. Feldman, J. Appelbaum, E. W. Felten, Lest we remember,
Communications of the ACM 52 (5) (2009) 91. doi:10.1145/1506409.600
1506429.
[7] P. Colp, J. Zhang, J. Gleeson, S. Suneja, E. de Lara, H. Raj, S. Saroiu,
A. Wolman, Protecting data on smartphones and tablets from memory
attacks, ACM SIGPLAN Notices 50 (4) (2015) 177–189.
[8] M. Gruhn, T. Mu¨ller, On the practicability of cold boot attacks, in: Avail-605
ability, Reliability and Security (ARES), 2013 Eighth International Con-
ference on, IEEE, 2013, pp. 390–397.
[9] T. Mu¨ller, M. Spreitzenbarth, Frost: Forensic recovery of scrambled tele-
phones, in: Proceedings of the 11th International Conference on Applied
Cryptography and Network Security, ACNS’13, Springer-Verlag, Berlin,610
Heidelberg, 2013, pp. 373–388. doi:10.1007/978-3-642-38980-1_23.
[10] A. Cui, M. Costello, S. J. Stolfo, When firmware modifications attack: A
case study of embedded exploitation., in: NDSS, 2013.
[11] V. Rijmen, J. Daemen, Advanced encryption standard, Proceedings of Fed-
eral Information Processing Standards Publications, National Institute of615
Standards and Technology (2001) 19–22.
[12] J. Borghoff, A. Canteaut, T. Gu¨neysu, E. B. Kavun, M. Knezevic, L. R.
Knudsen, G. Leander, V. Nikov, C. Paar, C. Rechberger, et al., Prince–a
low-latency block cipher for pervasive computing applications, in: Inter-
national Conference on the Theory and Application of Cryptology and620
Information Security, Springer, 2012, pp. 208–225.
[13] L. Su, A. Martinez, P. Guillemin, S. Cerdan, R. Pacalet, Hardware mech-
anism and performance evaluation of hierarchical page-based memory bus
protection, in: Proceedings of the Conference on Design, Automation and
Test in Europe (DATE), 2009.625
24
[14] W. Enck, K. Butler, T. Richardson, P. McDaniel, A. Smith, Defending
against attacks on main memory persistence, in: 2008 Annual Computer
Security Applications Conference (ACSAC), Institute of Electrical & Elec-
tronics Engineers (IEEE), 2008. doi:10.1109/acsac.2008.45.
[15] S. Chhabra, Y. Solihin, i-NVMM, Vol. 39, Association for Computing Ma-630
chinery (ACM), 2011. doi:10.1145/2024723.2000086.
[16] I. Anati, J. Doweck, G. Gerzon, S. Gueron, M. Maor, A tweakable encrypion
mode for memory encryption with protection against replay attacks, wO
Patent App. PCT/US2011/053,170 (2012).
URL http://www.google.com/patents/WO2012040679A3?cl=en635
[17] S. Gueron, U. Savagaonkar, F. Mckeen, C. Rozas, D. Durham, J. Doweck,
O. MULLA, I. Anati, Z. Greenfield, M. Maor, Method and apparatus for
memory encryption with integrity check and protection against replay at-
tacks, wO Patent App. PCT/US2011/042,413 (2013).
URL http://www.google.com/patents/WO2013002789A1?cl=pt-PT640
[18] B. Dolgunov, A. Aharonov, Memory randomization for protection against
side channel attacks, uS Patent 8,726,040 (2014).
[19] R. V. Sai, S. Saravanan, V. Anandkumar, Implementation of a novel data
scrambling based security measure in memories for vlsi circuits, Vol. 8,
2015.645
[20] Intel, 5th generation intel core processor family, intel core m processor
family, mobile intel pentium processor family, and mobile intel celeron pro-
cessor family (2015).
[21] M.-I. Neagu, L. Miclea, S. Manich, Improving security in cache memory by
power efficient scrambling technique, IET Computers & Digital Techniques650
9 (6) (2015) 283–292. doi:10.1049/iet-cdt.2014.0030.
[22] P. Kocher, J. Jaffe, B. Jun, P. Rohatgi, Introduction to differential power
analysis, Journal of Cryptographic Engineering 1 (1) (2011) 5–27. doi:
10.1007/s13389-011-0006-y.
[23] M. Neagu, L. Miclea, S. Manich, Defeating simple power analysis attacks in655
cache memories, in: 2015 Conference on Design of Circuits and Integrated
Systems (DCIS), Institute of Electrical & Electronics Engineers (IEEE),
2015. doi:10.1109/dcis.2015.7388557.
[24] N. Madalin, L. Miclea, J. Figueras, Unidirectional error detection, lo-
calization and correction for DRAMs: Application to on-line DRAM re-660
pair strategies, in: 2011 IEEE 17th International On-Line Testing Sym-
posium, Institute of Electrical & Electronics Engineers (IEEE), 2011.
doi:10.1109/iolts.2011.5994540.
25
[25] R. M. Fano, The transmission of information, Massachusetts Institute of
Technology, Research Laboratory of Electronics, 1949.665
[26] J. Liu, B. Jaiyen, R. Veras, O. Mutlu, Raidr: Retention-aware intelligent
dram refresh, in: ACM SIGARCH Computer Architecture News, Vol. 40,
IEEE Computer Society, 2012, pp. 1–12.
[27] P. Papavramidou, M. Nicolaidis, Reducing power dissipation in memory
repair for high defect densities, in: 2013 18th IEEE European Test Sym-670
posium (ETS), IEEE, 2013, pp. 1–7.
[28] Cacti tool v5.3.
URL http://quid.hpl.hp.com:9081/cacti/
26
Communication 
channel 
(attack) 
s *s
Ambient 
(countermeasure) 
noise 
( )H s
*( , )I s s
*( | )H s s
s
*s
c
( ) 1/ 2P s 
( ) 1H s 
*( ) ( | )H c H s s
* *( ) ( , ) ( ) ( | )L s I s s H s H s s @
( ) 1 ( )L s H c 
Random bit 
generator 
Attack 
Figure A.12: Communication channel model used to evaluate the leakage.
Appendix A. Leakage function
In this paper by leakage it is meant the amount of information extracted675
from the scrambling vector to produce an accurate estimate of it. Information
entropy is a metric for the measurement of the amount of information carried
by a set of symbols and is commonly used as an indication of leakage in secure
systems. The evaluation of the leakage exploited by an adversary during an
attack can be modeled as the communication channel shown in Fig. A.12.680
Assume that s is a scrambling vector bit and s∗ is its estimate obtained
by the at k. The attack can be viewed as a communication channel through
which two types of information are sent: signal values (scrambling vector bits)
and noise produced by the countermeasures in use. The ability of the attack to
separate the signal from the noise will determine the degree of leakage achieved.685
According to the definitions of communication channels in [25], entropy H(s)
is the amount of information contained in scrambling vector bits. Conditional
entropy H(s|s∗) is the information responsible for errors in the communication
channel. In our problem, it represents the noise injected by the countermeasures,
which make estimates of s∗ more or less correlated with s. Finally, I(s, s∗) in690
the output channel is the entropy of the bit ensemble s, s∗ and represents the
amount of information from the input channel s that reaches the output channel
s∗. This entropy represents the amount of leakage achieved by the attack and
will be renamed as leakage function L(s) or simply leakage. According to [25],
this leakage function is evaluated in the channel as695
L(s) = I(s, s∗) = H(s)−H(s|s∗) (A.1)
which represents the balance of information in the channel. The evaluation
of entropies H() proceeds as follows.
27
Communication 
channel 
(attack) 
s *s
Ambient 
(countermeasure) 
noise 
( )H s
*( , )I s s
*( | )H s s
s
*s
c
( ) 1/ 2P s 
( ) 1H s 
*( ) ( | )H c H s s
* *( ) ( , ) ( ) ( | )L s I s s H s H s s @
( ) 1 ( )L s H c 
Random bit 
generator 
Attack 
Figure A.13: Evaluation of entropies in an attack scenario.
Scrambling vector bits s are obtained from a random source. Hence, it can
be assumed that their probability is P (s) = P (s = 1) = 1/2. Since entropy
is the average of all information (log function) contained in binary symbols,700
we have that H(s) = −[P (s) · log2P (s) + P (s) · log2P (s)], where base 2 log is
used, and thus H(s) = 1, Fig. A.13. This indicates that scrambling vectors are
generated with the maximum amount of information possible, and accordingly
have the highest uncertainty.
Conditional entropy H(s | s∗) is calculated from the bit probability using705
the following equation:
H(s | s∗) = −
∑
i=s,s
j=s∗,s∗
P (i, j) · log2P (i | j)
(A.2)
where P (i, j) and P (i | j) are the joint and conditional probabilities, respec-
tively. These probabilities can be assessed in real experiments using the XOR
logic function, cf. Fig. A.13. If the probability in one of the inputs of the XOR
is P (s) = 1/2, then the following symmetries are observed in these probabilities:710
P (s, s∗) = P (s, s∗) = P (c)/2
P (s, s∗) = P (s, s∗) = P (c)/2
P (s | s∗) = P (s | s∗) = P (c)
P (s | s∗) = P (s | s∗) = P (c)
(A.3)
whose substitution in Eq. A.2 results in output c entropy of the XOR:
H(s | s∗) = H(c) = −[P (c) · log2P (c) + P (c) · log2P (c)] (A.4)
By considering Eqs. A.1 and A.4 and taking the boundary conditions in Fig.
A.13, the final expression for the leakage function is
L(s) = 1−H(c) (A.5)
For an open system, the evaluation of the leakage function starts by col-
lecting the individual bits of the scrambling vector S and estimated scrambling715
28
1 H(S) 1 0 0 0 1
0 0.999474 0 1 0 0 0
1 0 0 1 0 0
1 1 0 0 0 1
1 0 0 1 0 0
1 0 0 1 0 0
0 0 1 0 0 0
0 0 1 0 0 0
0 0 1 0 0 0
1 1 0 0 0 1
1 1 0 0 0 1
0 1 0 0 1 0
1 0 0 1 0 0
0 1 0 0 1 0
0 1 0 0 1 0
0 1 0 0 1 0
1 0 0 1 0 0
0 1 0 0 1 0
1 1 0 0 0 1
0 1 0 0 1 0
0 0 1 0 0 0
1 0 0 1 0 0
0 1 0 0 1 0
1 1 0 0 0 1
0 0 1 0 0 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-1 -0.5 0 0.5 1
cor(𝑠, 𝑠∗) 
𝐿(𝑠) 
𝐼 𝑠, 𝑠∗  
Figure A.14: Simulation of the circuit in Fig. A.13 and Eq. A.5, and theoretical curve derived
from Eq. A.1.
vector S∗. Since the attack will be repeated many times using different (usually
random) data vectors, after each one the result of the comparison vector C must
be generated, probability P (c) is estimated from its bits and then the leakage
function is evaluated. To improve the accuracy of the estimate, several attacks
can be performed keeping the same scrambling vector S and then all the bits720
of the set of vectors C are collected together to estimate the leakage function.
Finally, if the strength of the countermeasure needs to be tightly assessed, then
the same procedure must be applied but now including some experiments where
the scrambling vector is changed too.
In order to illustrate Eq. A.5, a short Monte-Carlo simulation was made for725
the circuit in Fig. A.13. A total of 2000 random bits were generated for each s
and s∗ pair and the correlation between them in the [-1,1] interval was modified.
Fig. A.14 summarizes the results of Eq. A.5, including 200 simulations and
the theoretical curve derived from Eq. A.1. It is interesting to see that the
maximum amount of information is leaked by predicting not only the same bit730
s = s∗(cor(s, s∗) = 1) but also the inverted one s = s∗(cor(s, s∗) = −1).
29
