Strongly Secure and Efficient Data Shuffle On Hardware Enclaves by Chen, Ju et al.
ar
X
iv
:1
71
1.
04
24
3v
1 
 [c
s.C
R]
  1
2 N
ov
 20
17
Strongly Secure and Efficient Data Shuffle on Hardware Enclaves
Ju Chen
Syracuse University
Syracuse, New York 13244-1200
jchen133@syr.edu
Yuzhe (Richard) Tang
Syracuse University
Syracuse, New York 13244-1200
ytang100@syr.edu
Hao Zhou
Syracuse University
Syracuse, New York 13244-1200
hzhou09@syr.edu
Abstract
Mitigating memory-access aacks on the Intel SGX architecture
is an important and open research problem. A natural notion
of the mitigation is cache-miss obliviousness which requires the
cache-misses emied during an enclave execution are oblivious
to sensitive data. is work realizes the cache-miss obliviousness
for the computation of data shuffling. e proposed approach is
to soware-engineer the oblivious algorithm of Melbourne shuf-
fle [23] on the Intel SGX/TSX architecture, where the Transaction
Synchronization eXtension (TSX) is (ab)used to detect the occur-
rence of cache misses. In the system building, we propose soware
techniques to prefetchmemory data prior to the TSX transaction to
defend the physical bus-tapping aacks. Our evaluation based on
real implementation shows that our system achieves superior per-
formance and lower transaction abort rate than the related work
in the existing literature.
CCS Concepts •Security and privacy→ Side-channel analy-
sis and countermeasures;
Keywords Obliviousness, memory-access paern aacks, Intel
SGX
ACM Reference format:
Ju Chen, Yuzhe (Richard) Tang, and Hao Zhou. 2017. Strongly Secure
and Efficient Data Shuffle on Hardware Enclaves. In Proceedings of Sys-
TEX’17:2nd Workshop on System Soware for Trusted Execution , Shanghai,
China, October 28, 2017 (SysTEX’17), 6 pages.
DOI: 10.1145/3152701.3152708
1 Introduction
Today we witness the emergence of hardware enclaves, a trusted
execution environment that protects trusted user program against
the untrusted operating system. A notable example is the re-
cently released Intel Soware Guard eXtension (SGX [4]) with in-
creasing adoption for secure public-cloud computing (e.g. in Mi-
croso Azure [7] and Google Cloud Platform [2]). Various side-
channel aacks on hardware enclave exploiting memory access
paern [12, 20, 28] have been proposed and demonstrated feasible
in practice. Defending side-channel aacks on SGX-alike enclave
architecture becomes an important and open research problem.
A natural notion of the defense is cache-miss obliviousness: e
hardware enclave features a trusted processor issuing cachemisses
to access the memory in the untrusted world. Security can be
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACMmust be honored. Abstracting with credit is permied. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
SysTEX’17, Shanghai, China
© 2017 ACM. 978-1-4503-5097-6/17/10. . . $15.00
DOI: 10.1145/3152701.3152708
assured by making the boundary crossing of cache miss oblivi-
ous to the sensitive data. is is especially effective to defending
the physical aack, e.g. by bus tapping [13] and soware aacks,
e.g. page-fault controlled side-channel aack [28]. In addition to
the strong security, cache-miss obliviousness helps induce beer
performance as cache-miss oblivious algorithms have lower time
complexity than the classic word-oblivious algorithms [22, 29] (see
§ 4.1 for performance discussion).
is work realizes the cache-miss obliviousness for data shuf-
fling computation. e data shuffling is a basic operation used in
many analytical computations. Specifically, we consider the Mel-
bourne shuffle algorithm [23] which divides the data accesses in a
shuffle to 1) the oblivious ones to a large external storage and 2)
the non-oblivious ones to a small internal storage. We map the in-
ternal storage to the cache inside trusted process and the external
storage to untrusted world (e.g. memory and disk). By this means,
the cache misses that only touch the untrusted memory are made
oblivious and thus safe to be disclosed.
When engineering the above mapping paradigm on SGX, one
challenge is how to conceal the internal storage in cache with as-
sured isolation. We leverage Intel Transaction Synchronization eX-
tension [6], a Hardware Transaction Memory feature in the latest
Intel Skylake processor. TSX was originally designed for efficient
concurrency. In this work, we propose a technique to enable the
TSX transaction to detect cache misses; the observation is that TSX
provides the capability of detecting early cache write-back (before
the transaction commit) and cache miss can be detected if one can
equate cache miss with cache write-back. e proposed technique
pre-fetches all memory referenced in a transaction and make them
dirty cache lines so that their eviction triggering write-back can be
detected.
In addition to cache-miss detection, we propose techniques to
avoid unnecessary transaction abort by carefully aligning data in
memory and ensuring no conflict during data prefetch. Our tech-
nique leverages the specific semantic of oblivious computation and
is distinct from the compiler-based partitioning schemes such as T-
SGX [25].
We conduct algorithmic analysis to demonstrate the beer com-
plexity of our hybrid external-oblivious algorithms. More impor-
tantly, we conduct performance study based on real implementa-
tion, with the hope of verifying the advantage of systems-level per-
formance of the cache-miss oblivious computation. e systems-
level performance study is necessary even in the presence of al-
gorithmic analysis. Because the overhead induced by cache-miss
obliviousness, mostly due to TSX transaction execution, is higher
than that of word obliviousness. Our performance study in real-
istic seings show that the systems-level overhead of cache-miss
obliviousness is negligible in the presence of beer time complex-
ity, and cache-miss obliviousness causes an overall beer perfor-
mance than the word-oblivious computation.
SysTEX’17, October 28, 2017, Shanghai, China Ju Chen, Yuzhe (Richard) Tang, Hao Zhou
2 Preliminary
2.1 Intel Soware Guard eXtension (SGX)
Intel SGX is a security-oriented x86-64 ISA extension on the Intel
Skylake CPU, released in 2016. SGX provides a “security-isolated
world” for trustworthy program execution on an otherwise un-
trusted hardware platform. At the hardware level, the SGX se-
cure world includes a tamper-proof SGX CPUwhich automatically
encrypts memory pages (in the so-called enclave region) upon
cache-line write-back. Instructions executed outside the SGX se-
cure world that aempt to read/write enclave pages only get to
see the ciphertext and can not succeed. SGX’s trusted soware
includes only unprivileged program and excludes any OS kernel
code, by explicitly prohibiting system services (e.g. system calls)
inside an enclave.
To use the technology, a client initializes an enclave by upload-
ing the in-enclave program and uses SGX’s seal and aestation
mechanism [11] to verify the correct setup of the execution envi-
ronment (e.g. by a digest of enclave memory content). During the
program execution, the enclave is entered and exited proactively
(by SGX instructions, e.g. EENTER and EEXIT) or passively (by in-
terrupts or traps). ese world-switch events trigger the context
saving/loading in both hardware and soware levels. Comparing
prior TEE solutions [1, 3, 5, 8], SGX uniquely supports multi-core
concurrent execution, dynamic paging, and interrupted execution.
2.2 Intel Transactional Synchronization eXtension (TSX)
e purpose of TSX is to enable atomic execution of a code block
or transaction from other processors’ view point. is goal entails
two requirements: 1) During the transaction execution, it is fully
contained inside a processor and its memory-access requests can
all be resolved inside the data cache. In other words, the trans-
action writes are buffered by dirty cache lines without being re-
flected in the memory. 2) By the end of transaction execution, the
cached writes are successfully wrien back and the transaction
can be commied only when there is no data conflict with other
processor. A conflicting data access occurs when there is a loca-
tion being accessed by two processors and at least one access is a
transactional write.
To realize the two requirements, the TSX hardware supports the
capability of aborting the execution of a transaction under various
causes. It aborts a transaction when data conflict is detected at the
transaction commit time (AbortCause AC1). In order to detect data
conflict, the hardware needs to track both the readset and writeset
of a transaction. e writeset needs to be kept inside the L1 data
cache (L1D) and readset needs to be inside the L3 cache. us, it
aborts the transaction when the dirty data-cache lines are evicted,
triggering cache write-back, before the end of transaction (AC2).
It also aborts upon the readset exceeding the L3 cache (AC3). In
addition, it aborts upon various systems events such as page-fault,
interrupts and other exceptions delivered to the processor (AC4).
While TSX is originally designed for the performance and pro-
grammability in multiprocessing, its capabilities can be (ab)used
for security purposes: It can be used to realize the cache-based
or register-based computation [15] and to protect the private key
from leaving a processor [16] by leveraging the TSX capability of
detecting early cache write-back. T-SGX [25] defends the page-
fault side-channel aacks by leveraging the TSX capability that
page-fault events are intercepted by the TSX abort handlers before
    x_begin
    ...
Prefetch:
    LD R15, 0x8231
    ST 0x8231, R15
  ...
Original data-access:
    LD RCX, 0x8231
    ...
    x_end
Execution Timeline
TSX guarantee 
No cache write 
back of 0x8231
TSX transaction
Figure 1. Intel TSX (ab)used to detect cache misses: e hack
here is that no cache write-back of a dirty line means the line stay
present in the cache, implying a cache hit upon a memory request.
the untrusted OS. is work uses TSX for detecting cache-misses
and for defending side-channel aacks. e use of TSX in this
work is elaborated below.
2.3 Use of TSX for Detecting Cache-Miss
Our work requires to conceal the access-leaky internal storage in
cache. Any access to the internal storage cannot be resolved by
cache miss, which would otherwise leak the sensitive access. To
conceal the internal storage in the last-level cache (LLC), it is equiv-
alent to ensure no LLC cache miss during the time period when
the internal storage is being accessed. us, the key requirement
of our work is to ensure no LLC cache miss caused by internal-
storage access.
TSX as is, however, does not provide this capability. For in-
stance, a cache miss in a transaction does not necessarily cause
the transaction to abort. A cache miss would abort a TSX trans-
action when it replaces a line that is read or wrien by the same
transaction. e key insight of this work is to prefetch the entire
read/write set of a transaction, such that the actual access must be
served by cache hits (without leaky cache miss). More concretely,
TSX guarantees any aempt to replace the prefetched cache line
would abort the transaction. With unreplace-able prefetched lines,
the actual access is guaranteed to be served by cache hits..
For example, consider the memory reference request of LD
RCX,0x8231 in the code sample in Figure 1. e memory refer-
ence can be resolved by a cache hit or a miss. To ensure no chance
of cache miss, we prefetch the data to the LLC in the beginning of
the transaction (i.e. “LD R15,0x8231”). Aer this instruction, the
LLC cache-line buffering the content at 0x8231 is recorded into
the readset of this transaction and is “pinned” there; TSX guaran-
tees any aempt to replace the LLC line will abort the transaction,
which is further captured by TSX abort handler. In other words, if
the transaction does not abort when the execution reaches instruc-
tion “LD RCX,0x8231”, the prefetched cache line is still present at
least in the LLC and the memory reference must not cause LLC
miss or cause any traffic on the system bus.
In general, the capability of prefetching transaction read-/write-
set and pinning them to unreplaceable cache-lines can assist mit-
igate various memory-access aacks including physical bus tap-
ping and cache-timing aacks. Bus tapping can bemitigated due to
pinned cache-line guarantees cache hits. e cache-timing aacks
are mitigated due to sharing cache-lines between transactions.
Strongly Secure and Efficient Data Shuffle on Hardware Enclaves SysTEX’17, October 28, 2017, Shanghai, China
3 System Design and Impl.
3.1 reat Model
We mainly consider a memory-access aacker who either directly
sniffs the out-of-process memory-access traffic (e.g. by bus tap-
ping or by page-fault channel [28]) or indirectly monitors the side-
channel of cache timing [12].
e non-goals of this work includes the following aacks. 1)
is work is complementary to rollback aacks load sealed but
stale data across power cycles and restore the system to a stale
state. 2) Given memory store both data and code, this work focuses
on data-memory. e code-access aacks are orthogonal that ex-
ploit the access paern to the memory region storing code, and
that can be defended by existing techniques [21, 25] on a small
memory. 3) We don’t consider other side-channel aacks exploit-
ing timing information or power usage [10]. 4) We don’t consider
denial-of-service aacks that the adversary declines to serve the
requests from the enclave.
3.2 Security Definition
Intuitively, the memory-access obliviousness states that the mem-
ory access trace in a program execution is independent with any
computation data (involving both input and intermediate data).
Consider the execution of a program P with data input I . e exe-
cution produces the memory-access traceT that consists of all the
last-level cache misses. e obliviousness requires that given two
data values, I0 and I1, the an oblivious execution produces the same
trace, that is, TP (I0) = TP (I1). is definition assumes determinis-
tic computation induced by P and is about “perfect” obliviousness
in the sense that it requires the traces under different input data
stay exactly the same. Due to the systems nature of this work, we
skip the more formal and generic definition of cache-miss oblivi-
ousness (e.g. based on indistinguishability formation [18]).
3.3 Step 1: Mapping Melbourne Shuffle to SGX
In this work, we focus on implementing cache-miss oblivious data
shuffling. Data shuffle is a fundamental operator in oblivious data
analysis computation. We describe the engineering of Melbourne
shuffle [23] on Intel SGX. e idea is to map the program of Mel-
bourne shuffle to TSX transactions and to isolate the leaky data
access in cache by abusing TSX capability. Note this work only
considers single-threaded execution.
Preliminary of Melbourne shuffle [23]: Melbourne shuffle
is an oblivious, randomized algorithm for data shuffle. Given a data
array and a permutation (of the same length), the computation of a
data shuffle produces an array that reorders the data array based on
the permutation. Internally, the Melbourne shuffle works in two
data scans or passes, where the first pass, called distribution, scans
the data array at the granularity of size
√
N buckets and reorders
individual buckets non-obliviously with the size-p logN
√
N inter-
nal memory. e second pass scans the array and sorts reorganized
buckets internally. Figure 2a illustrates an example of Melbourne
shuffle. Overall, the trusted memory is logN
√
N with array length
N . e details can be found in the original paper [23].
e Melbourne shuffle is mapped to TSX transactions such that
the accesses to internal storage are kept inside transactions while
external oblivious data accesses are kept outside transactions. In
Melbourne shuffle, the mapping is illustrated in Figure 3 where
the bucket-wise permutationmultiplication in the “distribute” pass
and the bucket-wise sort in the “cleanup” pass are mapped to indi-
vidual transactions.
One implication of this mapping is that the internal storage of
logN
√
N must be smaller than that of the size of L1 cache, which
is one factor that constrains the scalability of CMOS on real SGX
hardware (see § 4.1).
3.4 Step 2: Isolating Cached Data by TSX
Isolating cached data is realized by data prefetching which sim-
ply prefetch all data referenced inside a transaction. Given a
prefetched line, the TSX capability guarantees that at least the line
will not be replaced from LLC during the transaction. Our goal in
this work is to avoid self-eviction, that is, the dataset prefetched
does not conflict each other. Here, we consider both conflict and
capacity cache misses during prefetching. Given a set-associative
cache, we lay out memory properly such that the number of con-
flicts in each cache set do not exceed the capacity (i.e. the number
of ways). is applies for both L1 and LL caches.
Figure 2b shows how isolation is realized with the distribution
phase of Melbourne shuffle. First, data is prefetched from the en-
clave to the cache. Second, it runs non-oblivious computation on
the cached data; this phase is wrapped in TSX transactions to en-
sure no cache miss. ird, the end of the transaction triggers the
write-back of cached lines to the enclave memory. e second-
phase transaction, in particular, takes two continuous memory re-
gions as input and output data stored in another contiguous mem-
ory region. To avoid conflict in this layout, we partition the three
regions at granularity of cache lines and precompute (at compila-
tion time) that the number of conflicts in each cache set does not
exceeds the number of ways.
3.5 Implementation Notes
Abort Handling: Transaction aborts are handled by re-executing
the transactions. Before entering the transaction, the context (all
the values in registers) is saved to a memory area pointed by a
reserved register R15. Upon aborts, the handler reloads the con-
text prior to the transaction from R15, before jumping back to the
beginning of the transaction to re-execute it. An important prop-
erty is that the value of R15must be preserved through the regular
transactional path and abort path.
CPUID: Obtaining cache information (e.g. cache sizes) is real-
ized by calling the functions provided in Intel SGX SDK [4] which
switches out to the untrusted world and calls CPUID instructions.
Here, the result of CPUID is not necessarily be trusted. e enclave
can run test program to evaluate the cache sizes itself. Concretely,
it can read a series of arrays with increasing lengths and the max-
imal length without aborting transaction is the size of LLC cache.
Similarly, it can writes to a series of length-increasing arrays to
obtain the maximal length as L1 cache size.
Randomness generation: Melbourne shuffle is a randomized
algorithm and we generate true randomness using SDK-provided
function sgx read rand.
4 Evaluation
is section evaluates the performance and abort rate of CMOS
system. Specifically, it aims at answering the following two ques-
tions.
SysTEX’17, October 28, 2017, Shanghai, China Ju Chen, Yuzhe (Richard) Tang, Hao Zhou
1
6
5
7
2
0
8
4
3
b
c
d
e
f
g
h
i
a
1,b
3,a
6,c
2,f
5,d
7,e
0,g
4,i
8,h
b
f
a
i
d
c
e
h
g
Distribute Sort
Permu. Data Result
Bucket Bucket
(a) Melbourne shuffle: e example shows the shuffle of data array (a,b,c,…,i) based on
permutation (3,1,6,5,7,2,0,8,4). e bucket used in the distribute pass is vertical and the
bucket in the sort pass is horizontal
1
2
 ✁
SGX-TSX CPU Enclave Memory
Tx_begin
& prefetchTSX Transaction
Data array
Permutation array
Tx_commit 
& write back
(b) Melbourne shuffle run in SGX/TSX
Figure 2. Engineering Melbourne shuffle algorithms on SGX/TSX
1 int[] Melbourne_shuffle(int[] data ,
2 int[] perm){
3 perm_r =gen_perm ();
4 data_r =shuffle_pass(data ,perm_r );
5 perm_rr =shuffle_pass(perm ,perm_r );
6 return shuffle_pass(data_r ,perm_rr );
7 }
8 int[] shuffle_pass(int[] data ,
9 int[] perm){
10 int[][] inter=distribute(data ,perm);
11 return cleanup (inter);
12 }
13 int[][] distribute(int[] data ,
14 int[] perm){
15 for(int i=0;i<sqrt(length(data));i++){
16 inter[i]= tx_bucket_perm(data ,perm ,i);
17 }
18 return inter;
19 }
20 int[] cleanup (int[][] inter){
21 List res;
22 for(i < sqrt(length (data))){
23 res.add(tx_bucket_sort(inter ,i));
24 }
25 return res.toarray ();
26 }
Figure 3. Melbourne shuffle mapped to TSX transactions
• What is the performance of CMOS comparing the baseline
of word-oblivious shuffle and pure transaction-based pro-
tection?
• What is the abort rate of CMOS comparing with the imple-
mentation without prefetching?
4.1 Performance
We consider two baselines for performance comparison against
CMOS. e first baseline (BL1) is pack the naive, non-oblivious
shuffle algorithm in TSX transactions whose time complexity at
O(N ) is beer than the O(√N logN ) complexity of Melbourne
shuffle. e second baseline is the word-oblivious shuffle that is
realized by running a bubble sort on the permutation array. is
baseline has worse time complexity than CMOS, but it does not
need run transactions, adding performance uncertainty.
Experiment setup: We did all the experiments on a laptop
with an Intel 8-core i7-6820HKCPU of 2.70GHz, 32KB L1 and 8MB
LL cache, 32 GB RAM and 1 TB Disk. is is one of the Skylake
16 64 256 1024 4096 16384 65536
Array length (# of integers)
0
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
E
x
e
c
u
ti
o
n
 t
im
e
 (
u
s
e
c
)
CMOS 
Shuffle in Tx (BL1) 
Shuffle by sort (BL2)
Figure 4. Execution time of CMOS and other shuffle baselines
CPUs equipped with both SGX and TSX features. We use numeric
datasets and generate them randomly.
In the evaluation, we measure the execution time and the maxi-
mal data size supported. In transaction-based approaches, the indi-
vidual transaction is bounded by L1 cache size, which in our plat-
form is 32 KB.
e performance result is illustrated in Figure 4. It can be seen
that BL1 is the most efficient but with limited scalability. BL2 has
the best data scalability but may not be efficient, especially when
data size is large. CMOS starts to show superior performancewhen
data size roughly grows beyond 4096. CMOS can scale to larger
dataset than BL2 because of its smaller space complexity. It has
smaller execution time than BL1 because of the beer time com-
plexity.
4.2 Transaction Abort Rate
One of the design goals of prefetch in CMOS is to reduce the rate
of self-abort. To evaluate the design effectiveness, we run the pro-
gram several runs and measure the average abort rate. We con-
sider the first baseline that simply skips the external prefetching.
e baseline would cause transactions to abort due to both in-
terrupt (AC4) and cache write-back (AC2) (recall § 2.2). We also
consider two additional baselines that do not cause write-back by
placing nop instructions in a transaction. is way, the baselines
are only caused by interrupt. We implement two variants: the
Strongly Secure and Efficient Data Shuffle on Hardware Enclaves SysTEX’17, October 28, 2017, Shanghai, China
256 512 1024 2048 4096
In-transaction data size (byte)
0
1000
2000
3000
4000
5000
6000
N
u
m
b
e
r 
o
f 
a
b
o
rt
s
CMOS
BL1 (WB + Interrupt)
BL2 (Interrupt-Lower)
(a)With Baseline 1,2
256 512 1024 2048 4096
In-transaction data size (byte)
0
200
400
600
800
1000
N
u
m
b
e
r 
o
f 
a
b
o
rt
s
CMOS
BL3 (Interrupt-Upper)
(b)With Baseline 3
Figure 5. Number of transaction aborts
second baseline, named interrupt-lower, runs enough nop instruc-
tions such that the running time is equal to the case that all mem-
ory references are solved by cache hits. e third baseline, named
interrupt-upper, runs enough nop instructions such that the run-
ning time is equal to the case that all memory references are solved
by cache misses. Note that to determine the appropriate number of
nop instructions, we profile our SGX hardware using some bench-
mark programs (e.g. calling cache-flush instructions to measure
the cache-miss time).
We conduct experiments by running the four approaches above
multiple times. We vary the size of data referenced inside the trans-
action and report the percentage of transaction aborts during the
execution, in Figure 5. As can be seen, CMOS causes aborts that
are fewer than BL1, close to BL2, and are much fewer than BL3.
Given BL2 is ideal lower bound of aborts, the result shows that
the CMOS design of double prefetch is highly effective in reduc-
ing cache write-back aborts. In addition, CMOS is approximately
the same with the Ideal-lower bound, which shows the CMOS de-
sign eliminates the cache write-back aborts as the ideal can not be
aborted by cache write-back.
5 Related Work
In this section, we survey the related work on defending memory-
access side-channel aacks on hardware enclaves.
Making memory access oblivious is a feasible defense strategy
to side-channel aacks in general. Comparing the cumbersome
Oblivious RAM protocols [14, 27], oblivious algorithms [22] are
more lightweight and are promising towards practical aack de-
fense. Various data-analytical systems are developed based on the
computation-specific oblivious algorithms, such as Opaque [29]
for relational data analytics and oblivious machine learning [24].
ese systems instantiate the notion of word-obliviousness; recall
that its goal is to make memory references at word granularity
oblivious. ObliVM [22] is a source-to-source program transfor-
mation that translates the annotated program into efficient obliv-
ious algorithms. is line of research does not consider external-
oblivious algorithms and does not aain the cache-miss oblivious-
ness on SGX.
T-SGX [25] takes a general-purpose approach to defend the
page-fault side-channel aack [28]. T-SGX’s approach is to assume
the allocated memory is large enough to hold the data referenced
by the application, such that during execution there is no page-
fault. It leverages the TSX capability of detecting page-faults in
user-space programs. It partitions the program and wraps the par-
titions into individual TSX transactions. Similarly, work [26] takes
a compiler approach to defend the page-fault based side-channel
aacks. It is based on the notion of page-fault obliviousness.
is work is different from T-SGX in the following senses: 1)
e goals are different: T-SGX is a defense of page-fault side-
channel aacks, and this work is to defend all soware/hardware
memory-access paern aacks on SGX. T-SGX makes page-fault
in enclave unobservable, while this work is to make enclave execu-
tion cache-miss oblivious. 2) e approaches are different: T-SGX
defends by detecting page faults inside TSX transactions, while
this work defends by detecting cache misses inside TSX transac-
tions. While both seem to rely on TSX, their use of TSX transac-
tions is fundamentally different, which will be elaborated on in the
next paragraph. 3)e applicability is different: T-SGX is a general-
purpose, compiler-based solution, and this work is specific to data
analytics and leverage the corresponding “semantics” for beer ef-
ficiency. In addition, T-SGX mainly focus on the security of code-
page execution, while this work focuses on data-obliviousness.
e use of TSX transactions in T-SGX and this work is different.
In T-SGX, it assumes amemory large enough to store all data pages
and ensure no page-fault during enclave execution. Given the lim-
ited “size” allowed by a TSX transaction, T-SGX packages enclave
computation in as many TSX transactions as needed and ensure
the security of page-access across transactions by placing all the
code outside transaction on a single page (so-called Springboard
page). It is important to note the Springboard code-page does not
access memory on any other pages. In this work, the TSX trans-
actions are used to isolate the enclave computation inside the pro-
cessor and to detect any (unexpected) cache misses. Across trans-
actions, the data security is ensured by running external oblivious
algorithms. is work is applicable to a more realistic seing when
handling a large volume of data, that is, we allow page-fault dur-
ing program execution and the working-set memory can be much
smaller than original dataset.
6 Conclusion
is work defends enclave side-channel aacks by cache-miss
obliviousness. e proposed approach is soware engineering the
target oblivious computation on top of the SGX and TSX platform.
It proposes cache-miss oblivious algorithms with small trusted
space. It has several soware-engineering strategies that pack-
age the computation into TSX transactions, achieving cache-miss
obliviousness. rough initial evaluation, the performance over-
head of cache-miss obliviousness is much smaller than that of the
Strawman approach.
Acknowledge
is research is partly supported by the Cyber Research Institute
in Rome, NY, under Grant Number #28254.
References
[1] [n. d.]. ARM TrustZone, hps://goo.gl/Ks5FFq.
[2] [n. d.]. Google Cloud Platform supports Intel SGX.
[3] [n. d.]. IBM SCPU, hp://www-03.ibm.com/security/cryptocards/.
[4] [n. d.]. Intel Corp. Soware guard extensions programming reference.
[5] [n. d.]. Intel TXT, hp://www.intel.com/technology/security/ downloads/Trust-
edExec Overview.pdf.
[6] [n. d.]. Intel’s Transactional Synchronization Extensions (Intel’s TSX)
Overview.
[7] [n. d.]. Introducing Azure confidential computing.
[8] [n. d.]. TPM, hp://www.trustedcomputinggroup.org/tpm-main-specification/.
SysTEX’17, October 28, 2017, Shanghai, China Ju Chen, Yuzhe (Richard) Tang, Hao Zhou
[9] 2015. 2015 IEEE Symposium on Security and Privacy, SP 2015,
San Jose, CA, USA, May 17-21, 2015. IEEE Computer Society.
hp://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7160813
[10] Jose´ Bacelar Almeida, Manuel Barbosa, Gilles Barthe,
Franc¸ois Dupressoir, and Michael Emmi. 2016. Veri-
fying Constant-Time Implementations, See [17], 53–70.
hps://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/almeida
[11] Iai Anati, Shay Gueron, Simon P Johnson, and Vincent R Scarlata. [n. d.]. In-
novative Technology for CPU Based Aestation and Sealing. ([n. d.]).
[12] Jo Van Bulck, Nico Weichbrodt, Ru¨diger Kapitza, Frank Piessens, and
Raoul Strackx. 2017. Telling Your Secrets without Page Faults: Stealthy
Page Table-Based Aacks on Enclaved Execution, See [19], 1041–1056.
hps://www.usenix.org/conference/usenixsecurity17
[13] Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. IACR Cryptol-
ogy ePrint Archive 2016 (2016), 86. hp://eprint.iacr.org/2016/086
[14] Oded Goldreich and Rafail Ostrovsky. 1996. Soware Protection
and Simulation on Oblivious RAMs. J. ACM 43, 3 (1996), 431–473.
hps://doi.org/10.1145/233551.233553
[15] Le Guan, Jingqiang Lin, Bo Luo, and Jiwu Jing. 2014. Copker:
Computing with Private Keys without RAM. In 21st Annual Net-
work and Distributed System Security Symposium, NDSS 2014, San
Diego, California, USA, February 23-26, 2014. e Internet Society.
hp://www.internetsociety.org/doc/copker-computing-private-keys-without-ram
[16] Le Guan, Jingqiang Lin, Bo Luo, Jiwu Jing, and Jing Wang. 2015. Protecting
Private Keys againstMemoryDisclosureAacksUsing HardwareTransactional
Memory, See [9], 3–19. hps://doi.org/10.1109/SP.2015.8
[17] orsten Holz and Stefan Savage (Eds.). 2016. 25th USENIX Security Symposium,
USENIX Security 16, Austin, TX, USA, August 10-12, 2016. USENIX Association.
hps://www.usenix.org/conference/usenixsecurity16
[18] Jonathan Katz and Yehuda Lindell. 2007. Introduction to Modern Cryptography.
Chapman and Hall/CRC Press.
[19] Engin Kirda and omas Ristenpart (Eds.). 2017. 26th USENIX Security Sympo-
sium, USENIX Security 2017, Vancouver, BC, Canada, August 16-18, 2017. USENIX
Association. hps://www.usenix.org/conference/usenixsecurity17
[20] Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon
Kim, and Marcus Peinado. 2017. Inferring Fine-grained Control
Flow Inside SGX Enclaves with Branch Shadowing, See [19], 557–574.
hps://www.usenix.org/conference/usenixsecurity17
[21] Chang Liu, Austin Harris, Martin Maas, Michael W. Hicks, Mohit Tiwari,
and Elaine Shi. 2015. GhostRider: A Hardware-Soware System for Mem-
ory Trace Oblivious Computation. In Proceedings of the Twentieth Interna-
tional Conference on Architectural Support for Programming Languages and
Operating Systems, ASPLOS ’15, Istanbul, Turkey, March 14-18, 2015, O¨zcan
O¨zturk, Kemal Ebcioglu, and Sandhya Dwarkadas (Eds.). ACM, 87–101.
hps://doi.org/10.1145/2694344.2694385
[22] Chang Liu, Xiao Shaun Wang, Kartik Nayak, Yan Huang, and Elaine Shi. 2015.
ObliVM: A Programming Framework for Secure Computation, See [9], 359–376.
hps://doi.org/10.1109/SP.2015.29
[23] Olga Ohrimenko, Michael T. Goodrich, Roberto Tamassia, and Eli Upfal. 2014.
e Melbourne Shuffle: Improving Oblivious Storage in the Cloud. In Au-
tomata, Languages, and Programming - 41st International Colloquium, ICALP
2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part II. 556–567.
hps://doi.org/10.1007/978-3-662-43951-7 47
[24] Olga Ohrimenko, Felix Schuster, Ce´dric Fournet, Aastha Mehta, Sebas-
tian Nowozin, Kapil Vaswani, and Manuel Costa. 2016. Oblivious
Multi-Party Machine Learning on Trusted Processors, See [17], 619–636.
hps://www.usenix.org/conference/usenixsecurity16
[25] Ming-Wei Shih, Sangho Lee, Taesoo Kim, and Marcus Peinado. 2017. T-SGX:
Eradicating controlled-channel aacks against enclave programs.
[26] Shweta Shinde, Zheng Leong Chua, Viswesh Narayanan, and Prateek Saxena.
2016. Preventing Page Faults fromTelling Your Secrets. In Proceedings of the 11th
ACM on Asia Conference on Computer and Communications Security, AsiaCCS
2016, Xi’an, China, May 30 - June 3, 2016, Xiaofeng Chen, XiaoFeng Wang, and
Xinyi Huang (Eds.). ACM, 317–328. hps://doi.org/10.1145/2897845.2897885
[27] Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher W. Fletcher, Ling Ren,
Xiangyao Yu, and Srinivas Devadas. 2013. Path ORAM: an extremely simple
oblivious RAM protocol. In 2013 ACM SIGSAC Conference on Computer and
Communications Security, CCS’13, Berlin, Germany, November 4-8, 2013. 299–310.
hps://doi.org/10.1145/2508859.2516660
[28] Yuanzhong Xu, Weidong Cui, and Marcus Peinado. 2015. Controlled-Channel
Aacks: Deterministic Side Channels for Untrusted Operating Systems, See [9],
640–656. hps://doi.org/10.1109/SP.2015.45
[29] Wenting Zheng, Ankur Dave, Jethro G. Beekman, Raluca Ada
Popa, Joseph E. Gonzalez, and Ion Stoica. 2017. Opaque: An
Oblivious and Encrypted Distributed Analytics Platform. In 14th
USENIX Symposium on Networked Systems Design and Implemen-
tation, NSDI 2017, Boston, MA, USA, March 27-29, 2017. 283–298.
hps://www.usenix.org/conference/nsdi17/technical-sessions/presentation/zheng
This figure "sigchi-logo.png" is available in "png"
 format from:
http://arxiv.org/ps/1711.04243v1
