Timing Cache Accesses to Eliminate Side Channels in Shared Software by Ojha, Divya & Dwarkadas, Sandhya
Timing Cache Accesses to Eliminate Side Channels in Shared Software
Divya Ojha and Sandhya Dwarkadas
Department of Computer Science, University of Rochester
{dojha,sandhya}@cs.rochester.edu
Abstract—Timing side channels have been used to extract
cryptographic keys and sensitive documents, even from trusted
enclaves. In this paper, we focus on cache side channels created
by access to shared code or data in the memory hierarchy.
This vulnerability is exploited by several known attacks, e.g,
evict+reload for recovering an RSA key and Spectre variants
for data leaked due to speculative accesses.
The key insight in this paper is the importance of the first
access to the shared data after a victim brings the data into the
cache. To eliminate the timing side channel, we ensure that the
first access by a process to any cache line loaded by another
process results in a miss. We accomplish this goal by using
a combination of timestamps and a novel hardware design to
allow efficient parallel comparisons of the timestamps. The
solution works at all the cache levels and defends against
an attacker process running on another core, same core, or
another hyperthread. Our design retains the benefits of a
shared cache: allowing processes to utilize the entire cache
for their execution and retaining a single copy of shared code
and data (data deduplication).
Our implementation in the GEM5 simulator demonstrates
that the system is able to defend against RSA key extraction.
We evaluate performance using SPECCPU2006 and observe
overhead due to first access delay to be 2.17%. The overhead
due to the security context bookkeeping is of the order of 0.3%.
1. Introduction
Shared memory resources expose timing side channels
that can reveal information even in the presence of security
measures such as process isolation and enclave separation.
Cache side channels leveraging shared software have been
shown capable of extracting cryptographic keys, sensitive
documents, and data even from cryptographically secured
enclaves [5].
In this paper, we focus on cache side channels cre-
ated by access to shared software in the memory hierar-
chy. Shared software is an essential component to keeping
system costs low. For instance shared libraries (code) are
an important optimization in modern computing systems
to help keep the memory footprint low. Likewise, ser-
vices providing access to large data stores result in data
being shared across untrusted client requests. Access to
the shared code or data leaves a footprint in the memory
hierarchy, which has been exploited by several known at-
tacks [34] [12] [21] [32] [10] [5] [17].
A typical cache side channel attack when sharing
software involves evicting the shared data (e.g., code
from a shared library) from the cache hierarchy and re-
accessing it after the victim’s execution (using evict+reload
or flush+reload [32]). A fast re-access is indicative of an
access to the shared location by the victim. If the shared
library access is indexed by a secret data, the attacker
can infer the victim’s secret. This attack model is used in
attacks to leak cryptographic keys [32], in Spectre I, Spectre
II [17], and NetSpectre [26], in cross-tenant attacks to leak
data in clouds providing Platform-as-a-service [34], and in
discovering key strokes [28].
In this work, we design and evaluate a low-overhead
hardware-software solution to defend against attacks using
shared software. Our key insight is recognizing the impor-
tance of the attacker’s first access to the data after a victim
has brought the data into cache. We want to ensure that
the first access by a process to any cache line loaded by a
different process is a miss. In software, we save and restore
the timestamp of when an execution context was last run.
In hardware, we track cache line reload timestamps and
propose a novel hardware design to allow efficient parallel
comparison of a set of timestamps.
Prior solutions for eliminating cache side channels using
shared libraries partition the cache [29] [22], which reduces
the effective cache size for individual processes and does not
provide security guarantees or cache side channel resistance
on state-of-the-art partitioners [13] when data is shared
across contexts. Other solutions protect accesses only to the
LLC [20] [31]. Our goal is to retain the benefits of a shared
cache by allowing each process access to the entire cache,
and to protect every level of cache.
We create a “per-process view” of cache occupancy by
delaying (treating as a miss) the first access to a cache
line (i.e., a cache hit) that has been brought into the cache
by a different process. Accesses beyond the first will be
serviced as a hit. The delayed first access presents the
process with the illusion of timing isolation from other
processes. While there is share data between processes, the
isolation is achieved by giving every process the impression
that data is brought into the cache by its own access. This
approach breaks the fundamental premise of attacks using
shared software. The reduction in the performance due to
ar
X
iv
:2
00
9.
14
73
2v
1 
 [c
s.C
R]
  3
0 S
ep
 20
20
such delay can be considered elemental to the design of a
secure cache while at the same time avoiding a potential
O(n) space consumption for n processes sharing data in
a partitioned cache. Delay is incurred only when data is
evicted and reloaded, so that the steady-state in-cache shar-
ing is unaffected. As a consequence of this defense, systems
can choose to deploy memory deduplication techniques
to reduce memory footprint [15], [1] without the fear of
creating an avenue for cache side channels through a shared
software stack.
We propose and evaluate a timestamp-based cache ac-
cess management system that combines novel hardware and
software support to provide timing isolation. Each cache line
is augmented with a “load-time” timestamp and a bit per
hardware context representing whether the cache line has
been accessed by the execution context. This bit is checked
upon a cache hit and the request is serviced if the bit is set,
otherwise a miss penalty is incurred and a request is sent
down the memory hierarchy. Software saves (and restores)
these bits along with the “context-switch” timestamp at a
context switch. A novel bit-serial comparison logic allows
for fast parallel ”load-time” and ”context-switch” timestamp
comparisons.
We evaluate both the security and performance of our
defense on the GEM5 simulator [4] and demonstrate its
effectiveness against attacks using microbenchmarks and
an RSA attack. Our defense is able to prevent the classic
RSA attack used to demonstrate flush+reload attacks.
Performance evaluation using SPEC2006 shows an average
overhead of 2.17%, due to the delayed accesses. The over-
head due to the security context bookkeeping is about 0.3%.
This paper makes the following key contributions:
• The insight that disallowing the first access by a process
to a cache line from experiencing a cache hit when
the cache line has been brought into the cache by a
different process, is sufficient to prevent cache side
channel attacks through shared software.
• A timestamp-based solution to creating a per-process
view of cache line occupancy to prevent such attacks.
• A fast bit-serial comparison logic to compare times-
tamps for all the cache lines simultaneously.
• A simulation-based evaluation of the potential over-
heads of timing isolation and a demonstration that our
proposed solution prevents real-world attacks.
2. Background
2.1. Cache Side Channels
Information leaked as a result of shared cache utilization
is collectively referred to as cache side channels. Mecha-
nisms to exploit cache side channels were first exposed as
early as in 1992 [14], and different classes of attacks relying
on cache access timing have been developed since then.
Two types of information leaks are possible depending
on whether or not there is shared memory between the
attacker and the victim. With no shared memory between the
Figure 1: Shared Library Attack in Cache
Shared HW Cache
Shared SW lib/Database
evict 
sh_lib[x1,.,.,.xn]
access 
sh_lib[secret]
read miss sh_lib[x1]
read miss sh_lib[x2]
read hit sh_lib[x3]
secret = x3 !!
. 
. 
. 
. 
. 
.. 
. 
. 
attacker and the victim, the attacker can only learn the cache
set accessed by the victim using a “Prime+Probe” style of
attack [6]. In the presence of shared software, an attacker can
learn the line accessed by the victim using an “evict+reload”
or ”flush+reload” style attack [32], [11]. Defenses against
”prime+probe” attacks such as caches using randomized
placement [23] [19] are not effective against ”evict+reload”
or ”flush+reload” style attacks. The latter is a low-noise,
high-bandwidth, and more efficient form of attack. This
work addresses this second style of attacks.
2.2. Shared Library Attacks
Shared libraries have commonly used subroutines, which
can be mapped directly into the user program. The same
physical memory is mapped into different processes’ virtual
address spaces. Shared libraries are used to reduce memory
footprint and improve memory hierarchy efficiency. How-
ever, they create the potential for leaked memory access
patterns through cache side channels.
Side channels using shared libraries were earlier as-
sumed to affect only cryptographic routines. The advent of
more recent attacks have shown that they are a handy gadget
for more sophisticated attacks like Spectre [17][26][18].
They are also capable of leaking keystrokes from another
process [28], leaking passwords in cloud environments such
as an Amazon EC2 server, and leaking data across Virtual
Machines [34].
2.3. Threat Model
The threat model under consideration has a separate
attacker and victim process sharing some software stack.
They could be running simultaneously on the same (hy-
perthreaded) or different cores, or interleaved in time, and
the attack can be conducted from any level of the cache.
The attack is as shown in Figure 1 and has the following
sequence:
1) The attacker and victim share a software library and
a hardware cache. The access to the shared library is
dependent on or is indexed using the victim’s secret
data.
2) The attacker evicts a shared location from the cache
hierarchy.
2
3) It then waits for the victim’s execution.
4) The attacker subsequently reloads the same shared lo-
cation and determines that the shared location was also
accessed by the victim if it hits in the cache, determined
by timing the access.
This attack model is self-sufficient in the sense that
it has been demonstrated to be capable of leaking RSA
keys when using the GnuPG shared library [32], [12]. It
is also a low-noise, high-bandwidth tool for building more
sophisticated attacks like Spectre-I & II, SpectreRSB [18],
and Netspectre [26]. Each of these more recent attacks
rely on leaking shared library access patterns through cache
side channels and depend on a fast reload preceded by an
eviction.
2.4. Goal & Insights
The potential scope of cache side channel exploits in-
clude those devised in Spectre-style attacks based on specu-
lation [17] and is not limited to specific application domains
such as cryptography. Hence, domain- or attack-specific
solutions are not sufficient.
The goal of this work is to design a cache that allows
processes to reap the benefits of sharing both cache and data
without leaking access pattern information via timing side
channels. Preventing information leak requires identifying
and slowing down the first access to data present in the cache
but brought in by a different process. As a consequence of
this design, systems can choose to deploy memory dedu-
plication techniques to reduce memory footprint [15], [1]
without fear of creating an avenue for a reuse-based covert
channel attack.
3. Per-Process Cache Line Visibility
The goal of our design is to eliminate cache side chan-
nels through shared memory while retaining the benefits of
memory sharing: reduced bandwidth and energy consump-
tion for data movement and reduced space consumption in
the cache. Every process should see its first access to a
resident cache line that it did not bring in with a delay
equivalent to a miss in the cache. Once a cache line has
been accessed by a process, its subsequent accesses to the
data from the same cache line are allowed to go through as
a hit (presuming the data has not been evicted). This allows
different processes to share access to the same cache line
without revealing to one another if the cache line was made
available in the cache by another process. Compared to the
solutions that rely on cache partitioning (example, Intel’s
cache allocation technology [20]), this approach does not
restrict the size of usable cache for any process. It is also
applicable to any level of the cache including L1 and LLC.
3.1. First Access
A process’s first access refers to its first access to an
existing cache line that has been brought in due to access
by another process. It does not refer to the first usage of a
certain data in a program (commonly referred to as cold).
A cache line can have as many first accesses as the number
of processes accessing it. If a cache line is evicted and later
brought back by a process, it is considered unaccessed by
other processes.
The importance of the first access lies in the construct of
the attack. If the attacker times its first access after evicting
a data from the cache hierarchy and is able to detect a cache
hit, the attacker is able to infer the victim’s memory access
patterns. Beyond the first access, a fast access or a cache hit
does not provide any clue about the data access pattern of
another process. Based on this key observation, the defense
works by disallowing fast first accesses.
3.2. Timestamp-Based Design
We propose and evaluate a timestamp-based cache ac-
cess management system that combines novel hardware and
software support for timing isolation. Our design is based on
the observation that the attack under consideration exploits
the caching benefits due to another process. The attacker
evicts the shared memory and expects its subsequent access
to be fast if the victim has accessed the same shared memory
location. Hence we propose to identify and delay the first
access of a process, so the attacker is unable to infer whether
the memory was cached beforehand. Subsequent accesses
by the process proceed as a cache hit. Figure 2 provides an
overview of the hardware modifications.
Each cache line is augmented with a “load-time” times-
tamp to store the time at which the line was loaded. An
additional bit (security bit or sbit) per hardware context
represents whether the cache line has already been accessed
by the context. When a cache line is loaded, the sbit for the
loading context is set and the sbits for all other hardware
contexts are reset. For a cache hit, the sbit of the accessing
context is checked. When the sbit is set, the access is al-
lowed to proceed as a hit. Otherwise the access is recognized
as a first access and delayed, and the sbit is set so that the
future accesses can proceed as a hit. Accesses are delayed
by sending the request down the memory hierarchy but not
filling the cache with the received data. This mechanism
is implemented at every level of cache in the memory
hierarchy.
The sbit save and restore can be done by any trusted
computing base library at the time of context switch. In our
implementation, we allow the OS to save and restore the
process-specific sbits.
In order to retain caching behavior at context switch
and still provide timing isolation, software saves the sbits
for the context along with the “context-switch” timestamp
(Ts) at a context switch. Software also restores the sbits to
the cache in the corresponding hardware context when the
process resumes execution.
/*at context switch*/
restore_sbits();
if (Ts<Tc)
3
Figure 2: Timestamp-Based Cache Access
Processor Core
T1 T2
RqRsp
Tc0
Tc1
Tc2
Tc3
Tc4
Tc5
Tc6
Tc7
T2 Sbits
T1 Sbits
Bit line peripheralTs
Transpose G
ate
TimeStamp 
Comparator Cache
b ~b
a
S RS R
~a
reset
Comparator SRAM array
Q ~Q Q ~Q
…....
SA SA
reset sbit;
/*memory access*/
if (cache_miss)
request_mem;
handleFill;
sbit=1;
else if (cache_hit)&(sbit==0)
request_mem;
no_handleFill;
sbit=1;
/*cache line replacement*/
if (line_evicted)
for_all_threads:
sbits = 0;
An sbit reset is required when Ts is older than the line’s
“load-time”. A novel bit-serial, timestamp-parallel compar-
ison logic allows fast parallel “load-time” and “context-
switch” (Ts) timestamp comparisons. The details of the
implementation are described in Section 4.
In Figure 2, we show a cache consisting of 8 cache lines
accessed by two hardware contexts. The hardware support
added to a conventional cache is as follows:
• A bit-serial, timestamp-parallel comparison logic with
transpose gate and bitline peripherals, to compare
timestamps efficiently.
• A per cache line, per hardware context bit (security bit
or sbit).
• A shift register to hold Ts, the timestamp indicating the
time when the process last executed, for the process
about to begin execution due to a context switch.
Our design leverages the benefits of caching across
context switches as long as a cache line is not evicted. Two
processes running in an interleaved fashion and accessing
the same memory location will continue to enjoy fast access
as long as the data is not evicted from the cache. After each
eviction, each process will individually see a delayed first
access. Hence, our design leverages locality across context
switches while providing timing isolation, something that
cannot be achieved by simply flushing the cache on a context
switch. The mechanism described here is a processor feature
which can be turned off if the processes are trusted and
cache attacks are not a concern.
4. Implementation
The following subsections describe the implementation
details of each hardware modification and the software
support required for the defense.
4.1. First Access Delay Mechanisms
On a traditional cache access, data is returned to the
processor if a tag and state lookup succeeds. Otherwise, the
access incurs a miss and the request is passed on to the next
level in the memory hierarchy. With our cache design, the
sbit for the cache line is checked in addition to the state and
tag bits. An access is considered a hit only if in addition to
the above, the sbit of the cache line is set, in which case
data is returned to the processor from the cache.
A reset sbit indicates that the current process has not ac-
cessed the cache line. If the sbit is reset, the response to the
processor is delayed by sending the request up the memory
hierarchy. Once the response is received, the received data
is discarded, and the data in the cache line is forwarded to
the processor. The sbit is set to ensure that future accesses
to the cache line by the process do not result in additional
traffic and are treated without additional delay.
The rationale behind sending a request down the mem-
ory hierarchy even when the data is available in the cache
is to make the first access see a response latency equivalent
to the variable access latency it would have incurred on a
miss. It is possible that a context’s sbit is reset in a higher-
level (closer to the processor) cache but set in a lower-level
cache due to its larger capacity. Sending a request down
the memory hierarchy ensures that if the requested data is
available in a lower-level cache and has the sbit set, the
request is serviced with the lower cache response latency.
The data received in the response is, however, discarded, as
the cache has the most recent copy of the data.
When a cache line is evicted or invalidated, all sbits are
reset. When a cache line is loaded, the sbit for the hardware
context loading the line is set; the sbits for all other hardware
contexts sharing the cache remain reset.
On a context switch, Tc for each cache line is compared
against Ts (loaded into a special register by software) for
the context being resumed; the sbits for lines that have Tc
greater than Ts are reset to enforce delayed first access.
Thus, the sbit state is changed as follows:
• Reset when a process is first initialized
• Reset by the timestamp comparison logic when the
process is scheduled
• Reset when a cache line is evicted or invalidated
• Set for the requesting hardware context when a cache
line is loaded; reset on all other hardware contexts
4
Figure 3: Cache Access Flow Chart
Create Process
Reset sbits,Ts=0
Schedule Process
Mem Access
Send 
Response
Fetch Data, 
Set sbitCache Hit
sbit==0
N
Send MemRequest
Set sbit 
Context Switch
Save sbits,Ts
Restore Next sbits,Ts
Reset sbit for Tc>Ts 
MEM
N
Send 
Response
N
• Set for the requesting hardware context after a first
access to a previously loaded cache line
Software saves and restores the sbits for a process executing
on a hardware context at the time of a context switch.
Additionally, software maintains Ts for each process, which
is the time the process was most recently preempted. A
newly created process has both Ts and sbits reset when it is
scheduled for the first time.
The flow chart in Figure 3 represents the sequence of
events on a cache access and the required bookkeeping at
context switch.
4.2. Per-Process sbits Copy and Update
The sbits are saved and restored on a context switch to
preserve caching benefits across context switches. If the sbits
were not saved and instead reset on every context switch,
this would be equivalent to flushing the cache on every
context switch, which can impact performance heavily [7].
The number of 64-byte (cache line size) memory ac-
cesses required to save or restore sbits is dependent on
cache size. A small 64KB L1 cache requires only 2 64-byte
memory accesses, while a larger 8MB L3 cache requires
256 64-byte memory accesses.
Restored sbits cannot be used as is since they are stale
and need to be updated based on any changes in the cache.
If a cache line is evicted while a process is preempted,
its corresponding saved sbit in memory will not be up-to-
date. To update the sbits for cache lines that might have
been evicted, invalidated, or reloaded when the process was
preempted, we use the Ts timestamp. Ts indicates the last
time the sbits were brought up-to-date, so any cache lines
loaded after that time would not have been accessed by the
Figure 4: Transpose SRAM Array for Timestamps
Bit line Peripheral
Transpos e Interfac e
SA SA SA SA
SA
SA
SA
SA
DR
DR
DR
DR
SRAM
Cell
DR DR DR DR
process. When a process resumes execution, its restored Ts
is compared with the Tc of every cache line in parallel, and
the sbits for all cache lines with Tc greater than Ts are reset.
Timestamp comparisons are triggered only at the time of a
context switch and prior to resuming a process. Subsequent
accesses need no comparison since the sbits now contain
the necessary information.
We discuss the comparison of Tc and Ts, and the mech-
anism for updating large arrays of sbits in constant time
(proportional to the number of Tc bits) in the following
subsection.
4.3. Bit-Serial, Timestamp-Parallel Comparison of
Timestamps
A regular data access from memory is bit-parallel, i.e, all
the bits of a word are accessed at a time. Accessing SRAM
arrays in bit-parallel fashion implies that the time required
would be proportional to the number of cache lines. In order
to perform parallel comparisons of cache-line timestamps
(Tc) and Ts, we store the per cache-line Tc timestamps
along with the cache-line’s sbits in an SRAM array in a
transposed fashion, similar to that proposed in the neural
cache work [8]. The result is computation performed in a
bit-serial [3] and word-parallel (timestamp-parallel) manner.
The transpose memory unit [8] uses 8-T bit cells and
two sets of sense amps and drivers to access data in both
regular and transposed modes. While access times will be
higher compared to a 6-T SRAM cell, accesses can be
made in parallel with the much larger cache data arrays.
Figure 4 shows the timestamp array and comparison logic,
constructed with the 8-T multi-access SRAM cells. The
’transpose interface’ is used for the regular operation of the
cache, which is when timestamps are updated and sbits of
other contexts are reset, or an sbit needs to be looked up
or set. The ’regular’ bit-line peripheral interface is used for
sbit saves and restores, as well as for parallel timestamp
comparisons and sbit resets.
After the process-specific sbits are loaded into the
SRAM array in the sbits for the corresponding hardware
context, they need to be updated with the information about
5
the cache lines that have been evicted while the process
was preempted. This is done by comparing the Tc and the
restored Ts. The transposed timestamps allow a bit-serial
and timestamp-parallel comparison, taking time linear in the
number of bits in the timestamp (32 in our experiments). The
logic required for the timestamp comparisons and reset of
sbits is shown in Figure 5.
4.3.1. Bit-Serial Comparison Logic. Bit-serial computa-
tion allows us to simplify the comparison logic. The greater
of two unsigned integers can be determined by comparing
their bits sequentially starting from the MSB (most signifi-
cant bit): one of the two numbers can be declare as larger
when the first bit that differs is encountered: the larger
number will have the bit set in its binary representation
where the other number has the bit set to 0. We codify
the above algorithm in the following scheme iterating from
the MSB:
• If the bit position under consideration has a 1 for only
one of the two numbers, that number can be marked as
greater and the comparison is complete. This behavior
can be checked by performing an xor of the two bits.
• If the bit position under consideration has a 0 for both
the numbers, the next bit position is considered.
• If the bit position under consideration has a 1 for both
the numbers, the next bit position is considered.
For instance, the greater of the two numbers ‘1100’ and
‘0101’ can be determined as the first number ‘1100’ by
looking at the MSB.
Ts is loaded into a shift register. For each of 32 iterations
(the size of our Tc timestamp), the Tc timestamps are read
from the SRAM array 1 bit at a time using the ’regular’
bit-serial peripheral interface, at the same time as the shift
register is shifted left to feed the comparison logic.
• If Tc[i] is 0 and Ts[i] is 1, T c< T s, the sbits need not
be updated and the comparison should stop. We latch
this output and use it to ignore further bit comparisons.
• If Tc[i] is 1 and Ts[i] is 0, T c> T s, i.e., the cache line
is newer than the Ts. The bit-line peripheral latches a
’1’ and the latch output is used as the reset for the sbit.
Figure 5 shows the peripheral circuit attached to each SRAM
bitline. It requires 2 SR latches, which are reset prior to
initiating the timestamp comparisons, and 2 3-input and
gates for the comparison operation, with the Tc bit being
fed to ’b’ and the Ts bit to ’a’.
The comparison should stop if the Tc is determined to
be smaller than Ts, which is the result of the and gate on the
right. To ignore further bit comparison, the result is latched
using an S-R latch, and Q is fed to the and gate on the
left.
At the end of the 32 iterations, if it is determined
that T c > T s, as latched in the left-hand S-R latch, the
bitline drivers for which the S-R latch has been set, and the
wordline for the sbit corresponding to the hardware context,
are enabled, to write a 0 into the sbits.
Figure 5: Bit-Line Peripheral
b ~b
a
S RS R
~a
reset
Comparator SRAM array
Q ~Q Q ~Q
…....
SA SA
5. Evaluation
We implemented our defense in the GEM5 cycle-
accurate simulator [4] using L1I and L1D caches of 32KB
each and an L2 (LLC) cache of 2MB. We added a timestamp
and a per-hardware-context sbit to each cache line, which
are manipulated as described in Section 3.2. The process
context for a request packet in the cache is determined by
the CR3 register within the simulator. Changes in the CR3
register are used to trigger the timestamp comparisons and
the sbit saves and restores.
Table 1 specifies the real and simulation system param-
eters used for the evaluation.
TABLE 1: Evaluation Setup
Real Processor
Core i7-7700, 3304.125
L1D, L1I, L2, LLC cache 32K, 32K, 256K, 8192K
GEM5 Simulator
Core TimingSimpleCPU, 2GHz
L1D, L1I, LLC cache 32K, 32K, 2048K
The following subsections present an analysis and eval-
uation of the security and the performance overheads of our
timestamp-based defense on the GEM5 simulator.
5.1. Security Analysis
The attack depends on a fast reload due to another
process. If no process is allowed a cache hit due to another
process, the attack can be broken. If the first access is never
a cache hit, the attacker remains oblivious of the data being
cached beforehand and cannot learn if some shared data was
accessed by a victim. There is no channel left for the infor-
mation leakage using shared memory. The second access is
of no significance to the attacker. Allowing unaltered access
beyond the first access is sufficient to ensure security while
not significantly compromising performance. The additional
information tracked for the defense includes timestamps and
the sbits, are saved and restored by trusted software, and are
protected from unprivileged access. The sbits are not shared
across execution contexts and their access does not leak any
side channel information since they are not contingent on
sbit values.
6
5.1.1. Microbenchmark functionality evaluation. In order
to confirm the correct operation of the timestamp-based
approach, we created a microbenchmark attack consisting
of a pair of child and parent processes accessing a shared
memory-mapped array of size equal to 256 cache lines.
The parent process acts as the attacker, i.e., flushes the
shared array and yields the processor. The victim’s execution
follows, where it writes a value repeatedly to the shared
array. The parent process then wakes up and performs timed
reads of the entire array. A hit is considered a successful
attack. The attacker does not see any hit with our defense
simulation enabled in GEM5.
if parent
flush shrd_mem;
sleep;
read shrd_mem; // cache hit
else
read shrd_mem;
5.1.2. Attacking RSA. We use the flush+reload tech-
nique to attack the GnuPG version of RSA, as described
in the original paper [32]. The attack was tested both on
real hardware and the GEM5 simulator, both running Linux.
The attacker is an independent program, sharing the same
machine and hence the caches.
On a real machine, we install a non-stripped GnuPG
library and locate the offsets for the Square, Multiply, and
Reduce functions. The shared library has the encryption
algorithm for exponentiation, which performs a sequence
of Square-Reduce-Multiply-Reduce for processing a key bit
value 1 and a sequence of Square-Reduce when processing
a clear bit. RSA encryption is an example where the control
flow through the shared library is indexed using secret
information, i.e., in this case, bit values from the secret key.
In the original attack, the attacker flushes the cache and
then accesses the memory location for the Square, Multiply,
and Reduce functions in a loop, using the time to process a
1 or 0 bit coupled with whether or not accesses hit in the
cache to extract information about the key being used. In
our evaluation, we simplify the attack and assume a cache
hit in the attacker process represents a successful attack.
We calculate the time required for a cached and un-
cached access on the experimental real machine and set that
as the threshold for the cache hit. The attacker program is
an independent program running a loop to flush and read
memory. Reading the timestamps must be fenced/ordered
with respect to the memory access being timed to avoid
speculative loads. The attack goes through, i.e., the indepen-
dent attacker program gets hits for its accesses as a simulta-
neously running victim process performs an encryption. We
are able to launch the attack both on a real machine and in
GEM5 full-system emulation mode.
Our defense in GEM5 disallows any cache hit in the
attacker process since the attacker’s timed access is preceded
by a flush. The defense allows a cache hit in a process only
if it has suffered a cache miss for its first access. Since the
access after the flush to a cached data is the first access,
which is delayed, the attacker does not perceive a hit. This
attack was the key demonstration for the flush+reload
attack and our defense successfully breaks the attack.
5.2. Performance Evaluation
5.2.1. First Access Delay. We evaluate the performance
overhead of our first-access delay mechanism by running
SPECCPU2006 benchmarks in system-emulation mode. The
benchmarks are run for 1 billion instructions and we record
the overall increase in execution time (Figures 6a and 6b).
The number of first accesses when running a single
benchmark on a single core are impacted by sharing both
benchmark-specific code and shared libraries, as well as
due to the shared L1 cache. The performance overhead in
Figure 6a is a graphical representation of the overheads
due to misses on first accesses. The exact overheads are
presented in Table 2. The lower miss rates with the defense
are due to the increased number of total accesses to the LLC
cache from the extra first access misses.
Figure 7 shows the ratio of first accesses to the total
number of accesses at each cache level for different bench-
marks. As can be seen, mcf, omnetpp, and perlbench have
a higher fraction of first access misses, resulting in higher
performance overhead as seen in Figure 6a. On the other
hand, libquantum and wrf have a very low fraction of first
access misses, so their performance is not impacted by the
defense. An interesting observation is that both mcf and
perlbench have higher fractions of first access misses in the
last-level cache when run individually, and high overhead
when run together. However, when they share the LLC
cache with any other application, their effective first access
misses are lower because of cache contention, resulting in a
lower performance penalty. Both astar perl and mcf gromac
also have lower effective first access misses due to capacity
evictions when sharing the cache and hence lower overhead.
Figure 6b shows normalized execution time when run-
ning two different benchmarks simultaneously on two dif-
ferent cores. The average overhead with two benchmark
workloads is 2.17%. The overlap in common accesses and
hence first accesses are limited to shared libraries when
different workloads occupy the last-level cache.
The benchmarks that have several processes accessing
shared libraries have a higher overhead due to delayed
accesses since the caching benefits are not fully leveraged
from one process to another. Thus, benchmarks like perl-
bench and omnetpp, which simulate network workloads by
spawning several processes and using shared libraries among
the processes have a slowdown larger than single processor
benchmarks like libquantum.
5.2.2. LLC Size Sensitivity Analysis. To analyze the sen-
sitivity of our design to cache size, we evaluate the per-
formance overhead with different LLC sizes (Figure 8).
Since the bigger caches are expected to have lower eviction
rates for the same workload, there are effectively fewer first
accesses, resulting in a smaller additional delay. Hence, we
see the performance overhead in bigger caches to be smaller.
7
pe
rlb
en
ch
bw
av
es mc
f
mi
lc
gro
ma
cs
ca
ctu
sA
DM
les
lie
3d
na
md
go
bm
k
so
ple
x
ca
lcu
lix
hm
me
r
sje
ng
lib
qu
an
tum lbm
om
ne
tppas
tar wr
f0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
No
rm
. E
xe
cu
tio
n 
Ti
m
e
L1-64KB,LLC-2MB
(a) Individual benchmarks
go
bm
k_s
jen
g
mi
lc_
ca
ctu
s
lbm
_as
tar
na
md
_h
mm
er
as
tar
_p
erl
gro
ma
c_l
es
liie
mc
f_g
rom
ac
hm
me
r_l
ibq
pe
rl_
mc
f
les
lie
_g
ob
mk
om
ne
tpp
_w
rf
bw
av
e_m
ilc
wr
f_b
wa
ve
ca
ctu
s_n
am
d
lib
q_
om
ne
tpp
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
No
rm
. E
xe
cu
tio
n 
Ti
m
e
L1-64KB,LLC-2MB
(b) Two benchmarks
Figure 6: Performance overhead due to delayed first accesses; The average overhead for single benchmarks is 2.2% and it
is 2.17% for two benchmarks running together with L1 and LLC.
Figure 7: Delayed Access Miss Ratio at Each Cache Level
pe
rlb
en
ch
bw
av
es
m
cf
m
ilc
gr
om
ac
s
ca
ct
us
AD
M
le
sl
ie
3d
na
m
d
go
bm
k
so
pl
ex
ca
lc
ul
ix
hm
m
er
sj
en
g
lib
qu
an
tu
m
lb
m
om
ne
tp
p
as
ta
r
w
rf
av
er
ag
e0.00
0.01
0.02
0.03
0.04
0.05
Fi
rs
t A
cc
es
s 
M
is
s 
Ra
tio
l1d
l1i
llc
Our analysis with 2MB, 4MB, and 8MB LLC sizes has an
average performance overhead of 2.5%, 1.6%, and 1.7%.
The ratio of first access miss to the overall miss also varies
inversely with the cache size. The first access miss to overall
miss for the 2MB, 4MB, and 8MB cache sizes is 40%,
10% and 8% respectively. These numbers indicate that the
defense scales well with larger caches.
5.3. Area Overhead and Scaling
The increase in area due to the additional hardware is
primarily due to the separate SRAM array of timestamps
and sbits, and the comparison logic. This separate SRAM
array uses 8-T rather then 6-T cells and also includes an
additional set of sense-amps and bit-line drivers. The other
components required are the timestamp comparison logic at
each bit-line peripheral, consisting of 2 latches and 2 3-input
and gates and a shift register to hold Ts.
TABLE 2: SPEC2006 Execution Time Overhead, Miss
Rate(%) LLC(2MB)
Workload Overhead MissRateBaseline
MissRate
FA Miss
perlbench 1.068 8.35 7.65
bwaves 1.034 78.72 56.10
mcf 1.100 39.58 34.47
milc 1.009 66.55 65.03
gromacs 1.019 5.35 3.75
cactusADM 1.024 45.51 30.53
leslie3d 1.045 49.06 40.25
namd 1.007 6.81 4.50
gobmk 1.024 12.78 11.30
soplex 1.024 3.37 2.91
calculix 1.012 5.70 4.67
hmmer 1.007 0.42 0.27
sjeng 1.010 72.67 62.96
libquantum 1.000 99.65 99.59
lbm 1.008 71.68 66.37
omnetpp 1.065 0.58 3.01
astar 1.007 2.77 2.32
wrf 1.009 25.83 22.39
Average 1.022 33.07 28.78
gobmk sjeng 1.024 44.21 38.31
milc cactus 1.009 65.72 56.34
lbm astar 1.018 76.91 72.36
namd hmmer 1.007 3.74 2.48
astar perl 1.008 9.68 8.11
gromac leslie 1.021 45.23 36.45
mcf gromac 1.021 36.10 30.51
hmmer libq 1.008 74.31 65.91
perl mcf 1.058 27.94 23.29
leslie gobmk 1.024 43.98 36.62
omnetpp wrf 1.010 8.32 9.03
bwave milc 1.039 78.59 66.49
wrf bwave 1.011 56.36 43.85
cactus namd 1.008 36.28 24.48
libq omnetpp 1.055 26.57 20.89
Average 1.021 43.26 35.67
8
In our evaluation, we use 32-bit timestamps to keep
the area overhead low. Thus, there is an additional 32
bits per 64-byte cache line. The number of bits used for
the timestamp counter has an impact on the frequency of
timestamp rollover and is a parameter that can be controlled.
Timestamp rollover results in extra misses due to unneces-
sary sbit resets to retain correctness, which can result in
performance loss.
An sbit is required per hardware context that shares the
cache for each cache line. The total number of sbits can be
significant for the LLC in server-class processors. In order
to keep the number of sbits low, design principles used for
coherence directories could be applied, for example, limited
pointers [2]. The limited pointer [2] directory design work
demonstrated empirically that applications typically share
data across a few processors. Since pointers require log(n)
bits (for n hardware contexts), keeping track of a limited
number of sharers would reduce area overhead to O(log(n))
as opposed to n bits per cache line.
5.4. sbits Save and Restore Overhead
When a process is resumed, the sbits and the Ts that
were saved for the process at the time of preemption must
be restored. The overhead due to copying the sbits is low
for small cache sizes. The entire sbit array for an L1 cache
of size 64KB can be copied in 2 64-byte cache-line-size
memory accesses. The overhead scales with the size of the
cache. The copy can take 256 memory accesses for a last-
level cache of size 8MB. The sbits can be read and written
in parallel via the ’regular’ bit-line interface when a save or
restore is required at a context switch. The save and restore
is done to and from a kernel memory region reserved for
the sbits, to which the process context points.
TABLE 3: sbit size relative to the cache size
cache size (B) sbits accesses
64 1bit 1
64K 128B 2
256K 512B 8
8M 16KB 256
On an Intel i7-7700 processor operating at 3.6Ghz, the
time to copy sbits for an 8MB size cache without caching is
2.4 µ s. This is of comparable magnitude to a null context
switch or system call. A typical process time slice varies
from 1 ms to several ms, so the 2.4 µs overhead is at most
Figure 8: Sensitivity Analysis
2MB-LLC 4MB-LLC 8MB-LLC
0.96
0.98
1.00
1.02
1.04
1.06
1.08
1.10
No
rm
. E
xe
c.
 T
im
e
perf
0.24% of the process run-time. An extra layer of buffering
in hardware could allow the copy to be performed in parallel
with the execution of the next process.
6. Related Work
Existing solutions for protecting against cache side chan-
nel attacks that exploit shared libraries either resort to cache
partitioning or remove timing information from the accesses.
Both approaches incur significant overhead.
6.1. Dynamic Cache Partitioning
Statically partitioning caches causes significant per-
formance deterioration as some parts of the cache become
unavailable to other processes [22], whereas dynamic cache
partitioning can achieve lower overheads by reallocating
space as needed. SecDCP [29] is one such dynamic parti-
tioning technique, which broadly categorizes applications as
either ‘confidential’ or ‘public’ and prevents any information
leakage from confidential applications to public applications,
but allows information flow in the other direction. Although
this dynamic cache partitioning technique performs better
than its static counterpart, it provides a very coarse-grained
security classification [29]. Another dynamic cache parti-
tioning technique utilizes page coloring to allocate pages to
a secure domain [27], [33], but may incur significant copy
costs for recoloring.
6.2. Using Intel CAT-based Partitioning
Both Catalyst [20] and Apparition [7] have demon-
strated the use of Intel’s cache allocation technology (CAT)
to achieve cache partitioning for mitigating cache side chan-
nels. Both Apparition and Catalyst disallow or do not protect
against attacks using shared software. The performance of
the systems depend on their ability to reassign caches to
different applications and keep cache flushes to a minimum.
Apparition [7] uses one Class of Service (CLOS) per appli-
cation and flushes it across context switches. Catalyst uses
pinned pages to provide a solution suited to cloud service
providers. This defense mechanism is suited to prevent
cross-VM attacks and attacks targeted at the LLC, and is
not suited to higher-level caches. The design further requires
manually tagging pages that should be pinned or need to
remain secure.
6.3. Removing Time & Constant Time Algorithm
The ability to time data accesses precisely can also be
seen as the cause for side channel exploits. Taking this
ability away from untrusted applications is not sufficient
to prevent the attacks. There are several new techniques to
obtain timestamps in up to microsecond granularity. These
methods provide alternate timing primitives or recovery of
clock resolution [25] on systems that obfuscate time by
reducing the clock resolution.
9
Other approaches to mitigating side channels in shared
memory suggest program transformation for constant time
implementation. These program transformations have im-
practical overhead due to making each critical access
O(n) [9] and are not useful for large shared libraries.
7. Discussion
Sharing software is an important component of com-
puting systems for efficiency and consistency. This work
eliminates a channel for the leak of secret data via mon-
itoring a victim’s access to shared content using shared
caches. In the absence of shared content, shared caches
still allow a victim’s access behavior to be monitored, but
the information channel is far less accurate. In particular,
a “Prime+Probe” attack fills (primes) an entire cache set,
and infers the cache set accessed by the victim, based
on whether the attacker’s probe hits or misses. Proposed
defenses for a “Prime+Probe” attack include a randomizing
cache [24] [19]. These defenses do not work for attacks
against shared content, which provides a more accurate/less
noisy channel of information. Our solution in conjunction
with these defenses can provide a more complete defense.
Other approaches to defending against more recent at-
tacks like Spectre either stall execution, or make speculative
instructions invisible to succeeding load requests [30] [16].
They do not prevent non-speculative cache side channels.
Speculative side channel attacks rely on conventional side
channels for leaking speculatively loaded data to the at-
tacker, i.e., the means of data leak is conventional side chan-
nels. Breaking conventional cache attacks, we also prevent
speculative side channel leaks.
8. Conclusion
We have designed and evaluated a timestamp-based
defense against timing side channel attacks that rely on
fast reload of shared memory in caches to learn secret
information. Our design prevents attacks from cross-core,
same core, or SMT contexts, and at any level of cache, with-
out the need for cache partitioning. To perform timestamp
comparisons in parallel, we designed a hardware SRAM
array that allows bit-serial, timestamp-parallel comparison
with easy transposed access. We have evaluated the de-
fense against microbenchmark attack programs and the clas-
sic flush+reload attack using the GEM5 simulator. On
SPEC2006 benchmarks that create multiple processes, the
performance overhead due to delaying the first accesses
is 2.17% on average, and copying process-specific sbits
adds at most 0.24% even when there is a context switch
every millisecond. Our defense against timing side channels
through shared software retains the benefits of allowing
processes to utilize the entire cache capacity of a shared
cache and allows cache and memory pressure reduction
through data deduplication.
9. Acknowledgements
This work was supported in part by National Sci-
ence Foundation (NSF) Awards CNS-1618497 and CNS-
1900803. We thank Sreepathi Pai for his feedback during
early discussions of the ideas in this paper.
References
[1] Kernel samepage merging (memory deduplication).
https://kernelnewbies.org/Linux 2 6 32#Kernel Samepage
Merging .28memory deduplication.29, 2017.
[2] A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz. An eval-
uation of directory schemes for cache coherence. In International
Symposium on Computer Architecture (ISCA), pages 280–289, June
1988.
[3] Kenneth E. Batcher. Bit-serial parallel processing systems. IEEE
Transactions on Computers, (5):377–384, 1982.
[4] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Rein-
hardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower,
Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell,
Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood.
The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1–7,
August 2011.
[5] Guoxing Chen, Sanchuan Chen, Yuan Xiao, Yinqian Zhang, Zhiqiang
Lin, and Ten H Lai. Sgxpectre attacks: Stealing intel secrets from sgx
enclaves via speculative execution. arXiv preprint arXiv:1802.09085,
2018.
[6] Craig Disselkoen, David Kohlbrenner, Leo Porter, and Dean Tullsen.
Prime+ abort: A timer-free high-precision l3 cache attack using intel
{TSX}. In 26th {USENIX} Security Symposium ({USENIX} Security
17), pages 51–67, 2017.
[7] Xiaowan Dong, Zhuojia Shen, John Criswell, Alan L Cox, and
Sandhya Dwarkadas. Shielding software from privileged side-channel
attacks. In 27th {USENIX} Security Symposium ({USENIX} Security
18), pages 1441–1458, 2018.
[8] Charles Eckert, Xiaowei Wang, Jingcheng Wang, Arun Subra-
maniyan, Ravi Iyer, Dennis Sylvester, David Blaauw, and Reetuparna
Das. Neural cache: Bit-serial in-cache acceleration of deep neural
networks. In Proceedings of the 45th Annual International Symposium
on Computer Architecture, pages 383–396. IEEE Press, 2018.
[9] Oded Goldreich and Rafail Ostrovsky. Software protection and simu-
lation on oblivious rams. Journal of the ACM (JACM), 43(3):431–473,
1996.
[10] Daniel Gruss, Cle´mentine Maurice, Klaus Wagner, and Stefan Man-
gard. Flush+ flush: a fast and stealthy cache attack. In International
Conference on Detection of Intrusions and Malware, and Vulnerabil-
ity Assessment, pages 279–299. Springer, 2016.
[11] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. Cache template
attacks: Automating attacks on inclusive last-level caches. In 24th
{USENIX} Security Symposium ({USENIX} Security 15), pages 897–
912, 2015.
[12] David Gullasch, Endre Bangerter, and Stephan Krenn. Cache games–
bringing access-based cache attacks on aes to practice. In 2011 IEEE
Symposium on Security and Privacy, pages 490–505. IEEE, 2011.
[13] Andrew Herdrich, Edwin Verplanke, Priya Autee, Ramesh Illikkal,
Chris Gianos, Ronak Singhal, and Ravi Iyer. Cache qos: From concept
to reality in the intel® xeon® processor e5-2600 v3 product family. In
2016 IEEE International Symposium on High Performance Computer
Architecture (HPCA), pages 657–668. IEEE, 2016.
[14] Wei-Ming Hu. Lattice scheduling and covert channels. In Proceedings
1992 IEEE Computer Society Symposium on Research in Security and
Privacy, page 52. IEEE, 1992.
10
[15] Keren Jin and Ethan L. Miller. The effectiveness of deduplication on
virtual machine disk images. In Proceedings of SYSTOR 2009: The
Israeli Experimental Systems Conference, SYSTOR 09, New York,
NY, USA, 2009. Association for Computing Machinery.
[16] Khaled N Khasawneh, Esmaeil Mohammadian Koruyeh, Chengyu
Song, Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-
Ghazaleh. Safespec: Banishing the spectre of a meltdown with
leakage-free speculation. arXiv preprint arXiv:1806.05179, 2018.
[17] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike
Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael
Schwarz, and Yuval Yarom. Spectre attacks: Exploiting speculative
execution. arXiv preprint arXiv:1801.01203, 2018.
[18] Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, Chengyu
Song, and Nael Abu-Ghazaleh. Spectre returns! speculation attacks
using the return stack buffer. In 12th USENIX Workshop on Offensive
Technologies (WOOT 18), Baltimore, MD, August 2018. USENIX
Association.
[19] F. Liu, H. Wu, K. Mai, and R. B. Lee. Newcache: Secure cache archi-
tecture thwarting cache side-channel attacks. IEEE Micro, 36(5):8–16,
Sep. 2016.
[20] Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas,
Gernot Heiser, and Ruby B Lee. Catalyst: Defeating last-level cache
side channel attacks in cloud computing. In 2016 IEEE international
symposium on high performance computer architecture (HPCA),
pages 406–418. IEEE, 2016.
[21] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee.
Last-level cache side-channel attacks are practical. In 2015 IEEE
Symposium on Security and Privacy, pages 605–622. IEEE, 2015.
[22] D Page. Partitioned cache architecture as a e˙ide-channel defence
mechanism. 2005.
[23] Moinuddin K Qureshi. Ceaser: Mitigating conflict-based cache at-
tacks via encrypted-address and remapping. In 2018 51st Annual
IEEE/ACM International Symposium on Microarchitecture (MICRO),
pages 775–787. IEEE, 2018.
[24] Moinuddin K Qureshi. New attacks and defense for encrypted-
address cache. In Proceedings of the 46th International Symposium
on Computer Architecture, pages 360–371. ACM, 2019.
[25] Michael Schwarz, Cle´mentine Maurice, Daniel Gruss, and Stefan
Mangard. Fantastic timers and where to find them: high-resolution mi-
croarchitectural attacks in javascript. In International Conference on
Financial Cryptography and Data Security, pages 247–267. Springer,
2017.
[26] Michael Schwarz, Martin Schwarzl, Moritz Lipp, Jon Masters, and
Daniel Gruss. Netspectre: Read arbitrary memory over network. In
European Symposium on Research in Computer Security, pages 279–
299. Springer, 2019.
[27] Jicheng Shi, Xiang Song, Haibo Chen, and Binyu Zang. Limiting
cache-based side-channel in multi-tenant cloud using dynamic page
coloring. In 2011 IEEE/IFIP 41st International Conference on
Dependable Systems and Networks Workshops (DSN-W), pages 194–
199. IEEE, 2011.
[28] Daimeng Wang, Ajaya Neupane, Zhiyun Qian, Nael B Abu-Ghazaleh,
Srikanth V Krishnamurthy, Edward JM Colbert, and Paul Yu. Unveil-
ing your keystrokes: A cache-based side-channel attack on graphics
libraries. In NDSS, 2019.
[29] Yao Wang, Andrew Ferraiuolo, Danfeng Zhang, Andrew C Myers,
and G Edward Suh. Secdcp: secure dynamic cache partitioning for
efficient timing channel protection. In Proceedings of the 53rd Annual
Design Automation Conference, page 74. ACM, 2016.
[30] Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison,
Christopher Fletcher, and Josep Torrellas. Invisispec: Making specu-
lative execution invisible in the cache hierarchy. In 2018 51st Annual
IEEE/ACM International Symposium on Microarchitecture (MICRO),
pages 428–441. IEEE, 2018.
[31] Mengjia Yan, Bhargava Gopireddy, Thomas Shull, and Josep Tor-
rellas. Secure hierarchy-aware cache replacement policy (sharp): De-
fending against cache-based side channel attacks. In 2017 ACM/IEEE
44th Annual International Symposium on Computer Architecture
(ISCA), pages 347–360. IEEE, 2017.
[32] Yuval Yarom and Katrina Falkner. Flush+reload: A high resolution,
low noise, l3 cache side-channel attack. In 23rd USENIX Security
Symposium (USENIX Security 14), pages 719–732, San Diego, CA,
August 2014. USENIX Association.
[33] Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. Towards practical
page coloring-based multicore cache management. In Proceedings
of the 4th ACM European conference on Computer systems, pages
89–102. ACM, 2009.
[34] Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart.
Cross-tenant side-channel attacks in paas clouds. In Proceedings of
the 2014 ACM SIGSAC Conference on Computer and Communica-
tions Security, CCS 14, page 9901003, New York, NY, USA, 2014.
Association for Computing Machinery.
11
