Evaluating Information Leakage by Quantitative and Interpretable Measurements by Zhou, Ziqiao
EVALUATING INFORMATION LEAKAGE BY QUANTITATIVE AND INTERPRETABLE
MEASUREMENTS
Ziqiao Zhou
A dissertation submitted to the faculty at the University of North Carolina at Chapel Hill in















Ziqiao Zhou: Evaluating Information Leakage by Quantitative and
Interpretable Measurements
(Under the direction of Michael Reiter)
Noninterference, a strong security property for a computation process, informally says that the
process output is insensitive to the value of its secret inputs – the secret inputs do not ”interfere”
with those outputs. This is too strong, however; a degree of interference is necessary in almost all
real systems. In this dissertation, we propose a measure of noninterference that is more practical.
Based on a model of computations with three types of input (secret, attacker-controlled, and others)
and an attacker-observable output, we define a noninterference measure that can assess and explain
information leaks in actual codebases.
We start with assessing a new defense against cache-based side-channel attacks in a cloud
environment, using an experiment-based quantitative measure of leakage against existing attacks.
It is not enough to measure leakage through empirical analysis, however, as it fails to identify new
interference introduced by a weak defense design. We propose a symbolic execution framework to
formally measure interference in simple software procedures, encompassing any interference from a
set of secret inputs to observable outputs. Leveraging approximate model counting techniques, we
make this framework scalable with parallelization. Unfortunately, this technique does not scale to
support analysis of hardware processor designs, in part due to its reliance on symbolic execution
to create a logical postcondition of the computation. We thus modified the framework to sidestep
symbolic execution when analyzing processor designs. To further tame the complexity due to various
sources of interference, we extend our framework to remove, or declassify, certain interference
from consideration, so that the framework instead highlights other forms of interference, and to
provide human-interpretable rules that explain the conditions under which interference occurs. We
demonstrate the practicality of the work through case studies of both software-based leakage and





Foremost, I would like to thank my committee members: Dr. Jack Snoeyink, Dr. Montek Singh,
Dr. David Evans, Dr. Ilya Mironov for agreeing to trudge through my dissertation. I have my
deepest appreciation to my advisor, Dr. Michael Reiter, for providing me enough support in my
research. His endless enthusiasm and rigorous attitude always help me make our work better than
expected. Our weekly meetings would be the most valuable moments. Being Mike’s student is
undoubtedly the correct choice I made.
I also want to acknowledge my collaborators and mentors. Although I am the only author of this
dissertation, I get rid of “I” in chapters to credit my advisor and my collaborators’ contributions. I
thank Dr. Yinqian Zhang for sharing his research experience to help me get started and providing
useful feedback about the system design, when he was a postdoctoral researcher in the same group
and even after joining OSU. I thank to Dr. Zhiyun Qian for his active collaboration in static analysis
and his insightful feedback in my case studies. Many thanks to our Intel contacts, Dr. Matthew
Fernandez for his encourage in my research and his voluntary help in my job searching. Special
thanks to my mentor Dr. Junghwan Rhee for his patient supervision in NEC Lab. The intern
experience with him provides me enough chance to learn from machine learning experts, which
highly broadens my research horizon. I appreciate my mentor, Dr. Michael Vrable, at Google, for
providing me career advice and support. With his help, I learned the value of industry problems,
which are easy for small systems but become tricky in large systems.
I thank my labmates and my friends at UNC-Chapel Hill. With their help, I do not feel that I am
alone. In particular, thanks to Andrew Chi, Robert Cochran, Adam Humphries, Victor Heorhiadi,
Jung Jiang, Sheng Liu, Marie Nesfield, Ke Coby Wang, and Qiuyu Xiao. I always learned a lot
from their works from different perspectives.
A very special gratitude goes out to National Science Foundation and Intel for funding the
work.
Completing my Ph.D. while leaving my hometown is not easy for me. The key making me go
v
so far with comfort is that I have my family, particularly my father, my wonderful stepmother, and
my sister support and encourage me. I owe a lot to my grandmother, who raised me during my
childhood. I thank Sheng Liu, who knows and supports me much no matter what happens. I could
never have done this without them. I love you all.
vi
TABLE OF CONTENTS
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
CHAPTER1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Security Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Information Declassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Interpreting Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
CHAPTER2: BACKGROUND AND RELATED WORK . . . . . . . . . . . . . 7
2.1 Side Channels in CPU Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Flush+Reload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Prime+Probe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Speculative Execution CPU Risk . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Mitigations and their proof of security . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Generalization of Noninterference Property . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Delimited release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Abstract noninterference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Quantitative Information Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Measuring entropy uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Differential privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Model counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Hash-based approximate model counting . . . . . . . . . . . . . . . . . . . . . 12
2.4.2 Projected model counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
CHAPTER3: A DEFENSE AGAINST CACHE-BASED SIDE
CHANNELS WITH EMPIRICAL SECURITY . . . . . . . . . . . . . . . . . 14
3.1 Copy-On-Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
vii
3.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Cacheability Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Dynamic budget ki of cache lines . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Security Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Flush+Reload attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.2 Prime+Probe attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
CHAPTER4: STATIC ANALYSIS OF QUANTITATIVENON-
INTERFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 Quantitative Noninterference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.1 The need to vary n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.2 Procedures with other inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.1 From software procedure to logical postcondition . . . . . . . . . . . . . . . . 43
4.2.2 Hash-based model counting for Jn . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.3 Hash-based model counting for Ĵn . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.4 Parameter settings for computing Jn and Ĵn . . . . . . . . . . . . . . . . . . 48
4.2.5 Logical postconditions for multiple procedure executions . . . . . . . . . . . . 48
4.3 Microbenchmark Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.1 Leaking more about secret values vs. leaking about more se-
cret values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.2 Leaking more over multiple rounds . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.3 Leaking the secret conditioned on randomness . . . . . . . . . . . . . . . . . 52
4.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.1 Traffic analysis on web applications . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4.2 Leakage in compression algorithms . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.3 Linux TCP sequence number leakage . . . . . . . . . . . . . . . . . . . . . . . 57
viii
4.4.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
CHAPTER5: DECLASSIFICATION AND INTERPRETABILITY . . . . . . 64
5.1 Measuring Interference with Declassification . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Interpreting Leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.1 Noninterference and interference tuples . . . . . . . . . . . . . . . . . . . . . 68
5.2.2 Interpretation through a rule-based method . . . . . . . . . . . . . . . . . . . 69
5.2.3 Feature engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.1 Extracting Πproc(C, I,S,O) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.2 Preprocessing formula for #∃SAT . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3.3 Measurement with declassification using projected model counting . . . . . . 76
5.3.4 Sampling N̂S and ÎS for interpretable learning . . . . . . . . . . . . . . . . . 78
5.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.1 BOOM configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.2 Logically modeling cache states . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4.3 Cache-based side channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4.4 Side-channel-resistant cache designs . . . . . . . . . . . . . . . . . . . . . . . 87
5.4.5 Leaking exponent in modular exponentiation . . . . . . . . . . . . . . . . . . 91
5.4.6 Cache-based side channels in speculative execution . . . . . . . . . . . . . . . 93
5.4.7 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
CHAPTER6: CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
ix
LIST OF FIGURES
1.1 Motivating examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Copy-On-Access State Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Structure of copy-on-access page lists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 A cacheable queue for one page color in a domain . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Page fault handler for CacheBar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Reload timings in Flush+Reload attacks on a shared address vs. on
an unshared address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 Code snippet for Reload. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7 Confusion matrix of näıve Bayes classifier . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8 Accuracy per values of kv and ka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1 Relating ηmin and ηmax to min-entropy and mutual entropy, for the
idealized model of leakage explored in Sec. 4.1.1 . . . . . . . . . . . . . . . . . . . . . . 40
4.2 An example showing limitations of J on procedures with randomness
and improvements offered by Ĵ (see Sec. 4.1.2) . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Workflow of evaluating leakage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 A procedure that leaks about more secrets as M is decreased (see Sec. 4.3.1) . . . . . . 49
4.5 A procedure that leaks more about secret values as M is increased (see Sec. 4.3.1) . . . 50
4.6 Leakage of procedure that checks a guess of secret’s residue class modulo
M (see Sec. 4.3.1–4.3.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.7 An example illustrating leakage dependent on randomness (see Sec. 4.3.3) . . . . . . . . 52
4.8 Analysis of auto-complete feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.9 Leakage from Gzip and Smaz (see Sec. 4.4.2) . . . . . . . . . . . . . . . . . . . . . . . . 55
4.10 A code snippet for Linux TCP implementation . . . . . . . . . . . . . . . . . . . . . . . 58
4.11 TCP sequence-number leakage (see Sec. 4.4.3) . . . . . . . . . . . . . . . . . . . . . . . . 60
4.12 Average time per estimate (J(S, S′) or Ĵ(S, S′)) and most expensive
overall time (Jn or Ĵn) for case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1 Declassification example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Finding linear combinations of features near anchor points . . . . . . . . . . . . . . . . 72
5.3 Generating examples in ÎS using EF-solver . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 Way-associated cache in BOOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
x
5.5 Logical architecture for GShare branch predictor . . . . . . . . . . . . . . . . . . . . . . 82
5.6 Ĵn for Prime+Probe attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.7 Ĵn for Flush+Reload attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.8 ScatterCache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.9 PhantomCache (r =2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.10 ScatterCache, unknownMdom, memory sharing enabled (Flush+Reload
attack) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.11 PhantomCache, unknownMrdom, memory sharing enabled (Flush+Reload
attack) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.12 Memory sharing disabled (Prime+Probe attack), ∆(‘info’)← I(M) (or I(Mr )) . . . . . 90
5.13 Memory sharing enabled (Flush+Reload attack), ∆(‘info’) ← I(M)
(or I(Mr )) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.14 Sliding window modular exponentiation with window size W . . . . . . . . . . . . . . . 91
5.15 Ĵn for Modexp in 2-way, 8-set cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.16 Speculative execution example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.17 Ĵδn for Spectre in different procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.18 Time used in one estimation of Ĵδ(S, S′) . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.19 Time used in generating one tuple in N̂S or ÎS . . . . . . . . . . . . . . . . . . . . . . . 98
xi
LIST OF TABLES
4.1 Postcondition generation times for case studies . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Examples from YS \ YS′ for samples S, S
′ (r = 1) in CRIME attacks . . . . . . . . . . . . 57
5.1 CNF file size for extracted logic formulas . . . . . . . . . . . . . . . . . . . . . . . . . . 76
xii
LIST OF ABBREVIATIONS
#∃SAT Projected Model Counting Problem. 13, 75, 76
CRIME Compression Ratio Info-leak Made Easy. xii, 35, 55–57, 63
LLC Last-level Caches. 1, 7, 14, 20, 22, 27, 30, 33, 100
SAT Satisfiability Problem. xiii, 13, 69, 71, 74–76, 78
#SAT Model Counting Problem. 12
CNF Conjunctive-normal-form. 12, 13
COA Copy-On-Access. 18, 19, 26
KSM Kernel Same-page Merging. 15, 17, 20
LFSR Linear-feedback Shift Register. 81
OS Operating System. 8, 15, 16
PaaS Platform-as-a-service. 15
PTE Page Table Entry. 17–20, 24–26
PTW Page Table Walker. 81
QIF Quantitative Information Flow. 1, 10–12, 40, 52, 101
VM Virtual Machine. 14
xiii
CHAPTER 1: INTRODUCTION
Unintended information leaks in resource-shared environments are a persistent problem in com-
puter systems. The cause of those leaks is improper information flows that transfer information
from secret variables to public channels. In part, such information leakage arises from developers’
inability to test for them, since information leaks are typically not be evident in functional testing.
Static program analysis is one approach to discover information leaks before they occur. How-
ever, tools for static analysis suffer from a variety of shortcomings. First, most existing information-
flow tools (e.g., taint analysis and model checking) do not measure leakage, but merely detect it.
Second, due to the complexity of control-dependent assignments (i.e., implicit flows) and the lack
of domain knowledge within the detector, detection tools raise false alarms.
Measuring is important since perfect security is usually not possible in the real world. For
example, to defend against cache-based side channels, many mitigations [101, 93, 90] proposed in
recent years only increase the difficulty of exploiting cache channels but do not fully close them.
How to prove the security of those defenses is challenging, as some defenses may eliminate known
information flows but open new ones. In Chapter 3, we design a software method called CacheBar
to defend against cache-based side channels in last-level caches (LLC). The defense is proved to be
effective using both attack-specific empirical tests and model checking. However, empirical tests
do not capture some information flows not triggered by the tests performed, which are detected
instead by automated model checking. That is, empirical tests provide lower coverage of leakage
sources in the defense system than static methods.
Quantitative information flow (QIF) [29, 46] is proposed to measure the amount of leakage from
a computation. Due to the computation complexity, existing QIF implementations (e.g., [96, 16,
77, 48]) tend to sacrifice measurement fidelity by using empirical data or abstract models, instead
of static analysis on actual code bases. Furthermore, the measures used in earlier QIF works do
not capture the leakage pattern. For example, entropy loss, a popular metric used in many QIF
theoretical studies, only expresses the “expected” number of bits leaked, which can either leak
1
that number of bits in all executions or leak many more bits in some cases but few bits in others.
Chapter 4 proposes a more refined measure of leakage to help developers measure interference
patterns conditioned on sets of secret values. Chapter 5 extends the framework to accommodate
computations on specific hardware processor designs, while providing ways to decompose causes of
leakage and explain the causes using an interpretable model.
This dissertation is structured as follows. It first introduces some terminology to depict an
end-to-end security model called noninterference in Sec. 1.1. The model guides us in defining
and measuring leakage, starting with an empirical evaluation for cache-based side-channel attacks
in Chapter 3 and then maturing to a static measurement framework in Chapter 4. In addition,
Sec. 1.2 and Sec. 1.3 present the challenges in real-world noninterference evaluations, which are
addressed by a modified framework that accommodates hardware designs in Chapter 5. We also
present the toolchains implementing the proposed methodologies and use them to evaluate software
and hardware computations. Those results together demonstrate the dissertation statement:
Information leakage from computation can be quantitatively measured and ex-
pressed in a human-interpretable model. Using this model, one can evaluate existing
codebases and modifications thereof to restrict information flows.
1.1 Security Model
Noninterference [44] is a commonly used end-to-end model for information-flow security. In
the noninterference model, a secured computation is seen as a machine having inputs and outputs
where its secret does not determine what lower-level users (or attackers) can see or gain access to.
In other words, the secret inputs do not “interfere” with those outputs. Motivated by the security
model, this dissertation targets the information leaked about secret variables input to a procedure
proc. More specifically, we divide the formal input parameters to procedure proc into three disjoint
sets, namely VarsC , VarsS , and VarsI , having the following properties.
• cvar ∈ VarsC takes on a value C(cvar ) controlled by the attacker.
• svar ∈ VarsS takes on a value S(svar ) that is a secret for which an analyst is specifically concerned
with detecting leakage via the outputs VarsO . Unless otherwise stated we will assume there is
one secret input variable ‘secret’, and so VarsS = {‘secret’}.
• ivar ∈ VarsI takes on a value I(ivar ) that is not controlled by the attacker. The use of ivar
would be emphasized in Sec. 4.1.2.
2
In addition, there is a set VarsO such that each ovar ∈ VarsO takes on a value O(ovar ) that is
observable by the attacker. So we consider proc to be of the form
O ← proc(C, I,S)
This dissertation assumes that proc is deterministic; a nondeterministic computation proc can
be rendered deterministic by adding any randomness as inputs, say I(‘coins’). A given proc can
then be characterized by a logical postcondition Πproc(C, I,S,O) that constrains how the values in
O relate to those in C, I, and S in any execution.
A poorly designed procedure (and thus postcondition) may expose information about the se-
cret value in S through observable in O when the attacker chooses the input in C, which is the
information leakage that our framework will focus on. That is, for one execution with any S and
I, another execution with any S′ must have a chance (with some C′) to output equivalent O, when
the attacker chooses any equivalent controlled input C. A formal definition of noninterference for
a procedure proc could be expressed as:
∀C, I,S,S′ : ∃I′,O : Πproc(C, I,S,O) ∧Πproc(C, I
′,S′,O)
However, noninterference is overly strict, excluding any computation outputting some observable
values not feasible for some secret value. This dissertation instead explores the quantitative measure
of noninterference. One possible direction is to measure interference based on empirical data, which
is used in Chapter 3 to evaluate cache-based side channels in a complicated defense. Another
direction is to analyze the victim’s program through static analysis, which is discussed in Chapter 4
to cover more interference cases than empirical evaluations.
1.2 Information Declassification
In the real world, a computation may have inevitable leaks. For example, a password checker
revealing whether the password is matched with attacker’s input has an insecured flow from the
confidential password to the output. Such insecured flow is due to the designed functionality and
is an intended leakage.
Information declassification is an approach to exempt such allowed information flows [84], In-
3
formation flow research uses declassification to avoid reporting the leakage due to these allowed
releases of information. However, those works only detect but do not measure additional leak-
age. Without exempting the declassified information from the leakage, a measure may incorrectly
estimate the illegal leakage of a target program.
Chapter 5 proposes a methodology to use declassification in leakage measurement. The method-
ology enables analysts to declassify certain information, thereby focusing the measurement on any
other leakage that might be occurring, i.e., leakage that cannot be inferred from the declassified in-
formation. For systems as complex as modern processors, this ability is essential to permit analysts
to decompose and analyze leakage in a piecemeal fashion.
Slightly different from existing declassification research [83, 21], where declassification only
reduces the leakage, our work better defines the actual unintended leakage with an awareness that
declassification of certain information can either increase or decrease the unintended leakage. On
the one hand, if some interference is due to declassification of a part of the secret, our measurement
should not reflect that declassified information. For instance, suppose the procedure proc(C, I,S)
returns the lowest four bits of the secret:
proc(C, I,S) : O(‘result’)← S(‘secret’) mod 16
and the declassified information is whether the secret value S(‘secret’) is even or odd (i.e., S(‘secret’)
mod2). The interference measure should be decreased by this declassification, as the interference
caused by the lowest bit has been allowed. On the other hand, if the declassification increases the
risk to the secret when an attacker utilizes both the observable output from the computation and
the allowed declassified information, then this increase should be reflected in the measure. For
example, if the computation proc(C, I,S) uses a random variable I (‘coins’):
proc(C, I,S) : O(‘result’)← S(‘secret’) + I(‘coins’)
where the declassified information is I(‘coins’), then it leaks more than than proc without declas-
sification, since an attacker can use O(‘result’) and the declassified I(‘coins’) to reveal more about
S(‘secret’). We demonstrate those effects using both dummy and actual computations in Chapter 5.
4
proc (C, I, S)
if (C(‘test’) mod 2 = 1)






(a) Procedure (implicit flow)
proc (C, I, S)
O(‘result’)← C(‘test’) & S(‘secret’) & 1
(b) Procedure (explicit flow)
proc (C, I, S)
O(‘result’)← C(‘test’) & S(‘secret’) & 2
(c) Procedure (different explicit flow)
Figure 1.1: Motivating examples
1.3 Interpreting Leakage
A computation includes both software and hardware implementations. The sheer complexity
of hardware designs (e.g., CPU processor) means that once leakage is measured, the exact condi-
tions that cause this leakage might not immediately be evident. Our work seeks a measure with
interpretability to help developers understand sources of leakage and how to rectify them.
Chapter 4 provides a way to quantitatively measure the leakage for a procedure, which depicts
how much about the secret is leaked due to the interference and how often an interference could
happen. However, it does not explain why the procedure leaks information.
Consider the two procedures shown in Fig. 1.1(a) and Fig. 1.1(b). The procedure in Fig. 1.1(a)
returns 1 if C(‘test’) mod 2 = 0 and S(‘secret’) mod 2 otherwise, and the second procedure in
Fig. 1.1(b) returns the least significant bit of S(‘secret’) & C(‘test’). As such, the two procedures
leak the same information about the secret (i.e., both leak the least significant bit of the secret
when C(‘test’) is odd and nothing otherwise), using different coding styles. Indeed, both procedures
have the same quantitative leakage if we just quantify their interference. However, the quantitative
measure does not express the concrete pairs of inputs causing the interference, thus losing some
information about how the attacker’s controlled and observable variables work together to reveal
the secret. For example, the procedure in Fig. 1.1(c), which reveals the second bit of the secret when
the second bit of C(‘test’) is 1, leaks the same amount of information about a different portion of
the secret under a different attack condition. Merely relying on the quantitative measure proposed
in Chapter 4, we cannot explain to the developer that Fig. 1.1(c) causes a different interference
from Fig. 1.1(a) and Fig. 1.1(b).
Chapter 5 incorporates a method of interpreting the leakage, i.e., providing simple rules that
5
indicate circumstances in which leakage will (or will not) occur. Each such rule is additionally
accompanied by a precision and recall, so that analysts can prioritize the rules they address.
1.4 Implementation Considerations
There are several different ways to try to quantitatively measure interference. Depending on
the targeted computations, we may choose different implementations to quantify the leakage.
One possible method to evaluate the security of a complicated system is to replay existing
attacks and measure the difficulty of revealing the secret using the attacks. In Chapter 3, we
illustrate this approach by collect the attacker’s ability to exploit cache-based side-channel attacks
in a machine with or without a defense we propose. We train a classifier to tell how many cache lines
are used by the victim during a computation and calculate a confusion matrix to assess whether an
attacker can distinguish a secret value in the victim program from others, by using this classifier.
Since the measurement is based on data collected from specific attacks, it does not guarantee the
leakage’s completeness under all conditions.
To cover more sources of interference, Sec. 4.2 proposes a static quantification method using
symbolic execution to extract the logic postcondition Πproc of proc. With Πproc , that chapter
explores the assessment of leakage vulnerabilities by randomly sampling a space of secret values
and then limiting our search for pairs of attacker-controlled inputs and attacker-observable outputs
to only those that are consistent with some secret in that space. Finding two spaces of secret values
for which these counts suggest pairs consistent with one but not both then reveals interference.
To evaluate joint hardware-software vulnerabilities, static analysis (e.g., through symbolic exe-
cution) is unscalable due to the complexity of the hardware. Sec. 5.3 describes a implementation
called DINoMe to statically evaluate hardware-software vulnerabilities. DINoMe targets soft-
ware snippets for up to hundreds of cycles in open-sourced CPU processors (e.g., RISC-V BOOM).
Specifically, DINoMe considers a proc composed with partially symbolic processor and assembly
(i.e., software), which could complete execution within hundreds of CPU cycles. Instead of sym-
bolically executing this proc for multiple cycles (e.g., [97]), which should be expensive, DINoMe
generates a transition logic from the current state to the next-cycle state and then efficiently stitches
together a multi-cycle postcondition. Although the dissertation starts with the CacheBar work
and ends with a static methodology for measuring interference, the limitations of DINoMe (dis-
cussed in Sec. 5.5) make it hard to adapt to CacheBar.
6
CHAPTER 2: BACKGROUND AND RELATED WORK
2.1 Side Channels in CPU Caches
Cache-based side channels are important attack vectors that have been researched for many
years (e.g., [99, 95, 68]). The most common cache-based side-channel attacks are Prime+Probe,
Flush+Reload, and their variants.
2.1.1 Flush+Reload
The Flush+Reload (e.g., [99, 95]) is a highly effective cache-based side channel attacks that
was used, e.g., by Zhang et al. [99], to mount fine-grained side channels when memory sharing
enabled. It leverages physical memory pages shared between an attacker and victim security do-
mains (e.g., due to shared libraries), as well as the ability to evict those pages from cache, using
a capability such as provided by the clflush instruction on the x86 architecture. In the flushing
stage, the attacker Flushes a chunk of the shared memory out of the cache. After a short time
interval (the “Flush+Reload interval”) during which the victim executes, the attacker measures
the time to Reload the same chunk. In this case, the secret data whose value is reflected in the
use of shared memory is S, while the cache hit or miss which is reflected by the timing of Reload
is mapped to O.
2.1.2 Prime+Probe
Another common method to launch side-channel attacks via caches is using Prime+Probe
attacks, introduced by Osvik et al. [73]. These attacks have recently been adapted to use LLCs
to great effect, e.g., [68, 50]. Unlike a Flush+Reload attack, Prime+Probe attacks do not
require the attacker and victim security domains to share pages. Here, the attacker utilizes the
architectural property of an associative cache that different memory blocks may be stored in the
same cache set. In the Prime stage, the attacker Primes the cache by loading memory blocks into
a target cache set. Subsequently, the attacker Probes the cache by measuring the time to access
the previously loaded memory blocks. The time to do so tells the attacker how many cache lines in
that cache set were evicted by the victim in the interim. Generally, secret data (e.g., the private key
7
in decryption) whose value decides the use of memory blocks (e.g., indices in some lookup tables,
or the use of key-dependent instructions) mapping to known cache sets is S, while the cache hit or
miss which is reflected by the timing of Probe is mapped to O.
2.1.3 Speculative Execution CPU Risk
The cause of cache-based side channel attacks is the poorly controlled information flow from
sensitive data/instructions to shared cache resources. The recent Spectre [57] and Meltdown [65]
attacks further demonstrate the joint risk of speculative execution and cache side channels , in
this case allowing an attacker to read arbitrary memory locations in a victim process. To provide
more concurrency, a CPU predicts the outcome of a conditional branch and executes instructions
based on that prediction to reduce delays incurred by those instructions if its prediction was correct.
However, even if the prediction is incorrect, then some changes to the cache caused by speculative
execution will persist even after the mispredicted computations have been discarded. Those changes
propagate unintended information to exploitable cache-based side channels, allowing the attacker
to steal them.
2.1.4 Mitigations and their proof of security
Numerous proposals have sought to mitigate cache-based side channels in application, system,
and hardware levels.
The straightforward method to remove a cache-based side channel for a specific application is
to modify the application’s software code to better protect secrets from side-channel attacks. These
solutions range from tools to limit branching on sensitive data (e.g., [27, 28]) to application-specific
side-channel-free implementations (e.g., [58]). However, the the overheads of these techniques tend
to grow with the scope of programs to which they apply and can be very substantial (e.g., [81]).
Hardware-based mitigations redesign the the cache with a stronger isolation between different
security domains. One direction of the new design is to manage ownership of a cache line so that
disallowing the interference from unauthorized users e.g., [92, 54, 67]). Other works (e.g., [93, 90])
tends to use domain-specific memory-to-cache mapping to increase the difficulty to decrypt the
memory address. However, deploying secured cache design to all machines is not that possible in
the foreseeable future.
System-level countermeasures tend to mitigate the cache side channels by modifying operating
system (OS) or hypervisor. The modifications obfuscate the cache timing or isolate the memory
8
access across different security domains. For example, several works provide to each security do-
main a limited number of designated memory that are never evicted from the LLC (e.g., [55, 66]),
thereby rendering their contents immune to Prime+Probe side channel attacks. To mitigate
Flush+Reload, some works [74, 99, 11] have suggested disabling or selectively enabling memory
sharing for countering various side-channel attacks exploiting shared memory, while stopping short
of exploring a complete design for doing so.
Those works usually use theoretical explanations [55, 92] or empirical evaluations to prove their
improved security. A commonly used empirical evaluation targets the timing channel strength in
a specific cryptographic system (e.g., [27, 58, 54, 67, 66]). Specifically, they rerun concrete attacks
targeting a cryptographic system and then use timing statistics (e.g., the timing difference) to reflect
the timing channel strength. Such evaluations do not reflect the attacker’s actual power using a
classifier, which takes timing as input and predicts the secret. Another direction to evaluate the
cache-based side channels depicts the side-channel leakage through the performance of the classifier,
e.g., the expected number of correct bits (e.g., [28]) or the confusion matrix [81]). In Chapter 3, we
provide a copy-on-access design as an efficient memory isolation selectively enabling memory sharing
for addressing Flush+Reload attacks, and extend this idea with cacheability management for
Prime+Probe defense, as well. Then we prove the improved security using confusion matrices
from empirical evaluations and further check the security using a formal model.
2.2 Generalization of Noninterference Property
Noninterference property provides a strong security guarantee of zero information leakage. Se-
curity type systems enforce the noninterference by tracking information flow within programs and
checking security rules used to assign variables with different security labels. However, strict non-
interference makes those systems hard to use in practice due to inevitable flows from high-security
to low-security labels. To make the noninterference model more tractable, declassification [84] pro-
vides a weaker form of information flow policies that defines what information could be released,
where within the code it is released (i.e., locations), who releases it (i.e., users), and when it is
released (i.e., time). To support declassification, many works (e.g., [83, 43, 21, 9, 38]) changes




Sabelfeld et al. [83] introduces a method to define declassification use called delimited release. It
uses a specialized function declassify to mark a collection of expressions about what information
to release. Then a program satisfies the delimited release if it has the following property: for any
two executions which only differs in the value of secret S and S′, observable value is always the
same when the value of expressions defined by declassify function is the same.
In chapter Chapter 5, a modified quantitative measure accepts the declassified information
which defines what a secret information could be released through a developer-defined logic formula.
2.2.2 Abstract noninterference
Giacobazzi et al. [42, 43] introduce the notion of abstract noninterference to weaken the nonin-
terference by parameterizing standard noninterference relatively to what an attacker can observe.
Specifically, the abstract noninterference considers properties instead of values as observable objects.
Since the attacker may not be able to access all public values in a program directly, the abstract
noninterference intuitively defines a weaker observer (than an observer directly using values from
the victim’s procedure) through abstract interpretation. To use declassification, it again uses an
input property representing which inputs they need to check noninterference.
The case studies for cache-based side channels in Chapter 5 compose attacker’s observable and
controllable variables instead of directly using existing variables in the cache module, as an attacker
only loads or flushes memory blocks and observes cache hit or miss but does not directly modify
or observe any registers in cache modules.
2.3 Quantitative Information Flows
Another direction to generalize noninterference property is to measure instead of only detecting
the violation. First introduced by Denning [29] and Gray [46] in the 1980s, QIF measures the
amount of information leaked about a secret by observing a program execution.
2.3.1 Measuring entropy uncertainty
The earliest model of QIF ([29, 46, 22, 23, 24]) uses the Shannon mutual information to measure
uncertainty about the secret. Smith [86] claims that Shannon entropy fails to accurately capture the
vulnerability of a program, as it does not measure the probability of correct guessing. Furthermore,
Clarkson et al. [25] argues that the uncertainty-based measurement is inadequate as it only measures
10
the probability of an attacker being absolutely correct or absolutely wrong.
Our work improves on prior work in QIF along one or more of the following dimensions. First,
computing the measures in these works often involves computing outputs induced by sampled
secret values (e.g., [24]), which sometimes leverages application-specific restrictions to be tractable
(e.g., [96]). Our framework proposed in Chapter 4, in contrast, does not require such application-
specific restrictions. Second, exploiting leakage vulnerabilities often requires attackers not only to
observe outputs but also to inject inputs, and many applications incorporate other inputs, as well.
These QIF calculations are not possible without knowing the distributions from which these values
are drawn (e.g., [71]), and so some works (e.g., [59, 78]) heuristically assign specific values to these
unknown inputs, potentially hiding the leakage from other assignments. Our analysis computes
conditionals in a different “direction,” i.e., counting possible combinations of attacker-controlled
inputs and attacker-observable outputs conditioned on sets of secret values and while leaving other
inputs constrained. In doing so, our technique accommodates attacker-controlled inputs but does
not presume knowledge of the attacker’s strategy or the distributions of these or other inputs.
Third, some QIF frameworks work only for deterministic procedures (e.g., [76, 60]), whereas ours
accommodates nondeterministic ones, as well.
2.3.2 Differential privacy
Differential privacy [33] is a criterion for privacy protection that many algorithms have been
devised to satisfy. As originally expressed, differential privacy requires that any output observed
from a computation on a database is insensitive to the existence of any single element (row) in that
database; i.e., the probability of observing a computation output is nearly the same even if any
single row is added or removed. Viewing the database as our secret value, this definition therefore
requires that computations on nearby secrets result in the same attacker-observable outputs, with
high probability. In contrast to differential privacy, our work does not leverage a distance measure
over secrets; i.e., there is no notion of “nearby” secrets in our definition.
Moreover, the focus of our work is somewhat different in providing a way to measure and ex-
plain leakage for arbitrary software or hardware designs, versus in providing a prescriptive measure
to limit that leakage. Notably, since differential privacy requires a statistical guarantee of indistin-
guishability for any two nearby databases, it mandates a requirement that typically can be met
only through the addition of noise artificially to the computation output. As such, this definition
11
has driven considerable research on algorithms for adding noise to observable outputs to enforce
this condition.
2.4 Model counting
Solution counting, the problem of computing the number of solutions for a given constraint, is
necessary for static analysis of QIF, which does not rely on empirical data. For a propositional
formula, the counting problem is called model counting problem (#SAT) [45], where a model is a
feasible solution for F . Thus, it is a #NP problem. Practical model counting techniques can be
categorized to exact counting and approximate counting.
Exact model counting tends to use DPLL-style exhaustive search. Specifically, it uses a back-
tracking algorithm to repeatedly search a feasible model, block it, and find a new model. Such
exhaustive technique is not scalable when the number of models is large. Approximate model
counting uses a sampling method to estimate the number of feasible models in the formula without
enumerating all models. The recent hash-based model counting proposed by Chakraborty et al. [15]
improves the scalability and provides a proven lower and upper bounds with a confidence guarantee
in a statistical sense.
2.4.1 Hash-based approximate model counting
The hash-based approximate model counting technique due to Chakraborty et al. [15] leverages
a family of 3-wise independent hash functions to estimate the number #F of satisfying assignments
of a conjunctive-normal-form (CNF) proposition F of v variables and runs in fully polynomial time
with respect to a SAT oracle. At a high level, this algorithm iteratively selects a random hash
function Hb : {0, 1}v → {0, 1}b from a family (where b changes per iteration) and a random
p ∈ {0, 1}b, and computes the satisfying assignments for F for which the hash of the assignment
(a string in {0, 1}v) is p. (Intuitively, this number should be about a #F/2b.) Through judicious
management of this iterative process, the algorithm arrives at an estimate #̃F for #F that satisfies
P
(
(1 + ǫ)−1 ·#F ≤ #̃F ≤ (1 + ǫ) ·#F
)
≥ δ
where error ǫ, 0 < ǫ ≤ 1, and confidence δ, 0 < δ ≤ 1, are parameters and the probability is taken
with respect to the random choices of the algorithm.
Previous QIF-related works leveraging model counting either support only convex constraints
12
(e.g., [7, 76]) and so therefore do not capture all constraints of realistic applications, or use exact
counters (e.g., [77]) and so cannot scale to complex applications. In contrast, Chapter 4 leverage
principled sampling-based methods for counting purposes, which we show can be used to expose
leaks in real codebases. Chapter 4 also demonstrates a new approach for using model counting to
estimate information leakage based on noninterference property, again deriving from a strategy of
counting pairs of attacker-controlled inputs and attacker-observable outputs conditioned on secret
value sets of different sizes, in contrast to these prior works.
2.4.2 Projected model counting
Projected model counting problem (#∃SAT) counts feasible assignments of selected variables in a
propositional formula. For a realistic application, the automatically generated CNF formula would
introduce auxiliary variables in order to support the encoding of different operations. Thus, our
counting task is projected model counting. The algorithm used in model counting can be applied
to project model counting through a minor modification. In Chapter 4, instead of applying a hash
constraint to all variables in the F , the implementation adds a hash function over the counting
targets C,I, and O.
13
CHAPTER 3: A DEFENSE AGAINST CACHE-BASED SIDE CHANNELS WITH
EMPIRICAL SECURITY1
This chapter focuses on information leakage through the cache-based side channels in LLCs. We
expect to use this dedicated example to emphasize the importance of measuring with static analysis,
by comparing the empirical measure and static verification based on a formal model.
Recall that Flush+Reload, Prime+Probe, and their variants are the most commonly used
attack vectors in cache-based side channels. We propose a software-only defense against these
LLC-based side-channel attacks, based on two seemingly straightforward principles. First, to de-
feat Flush+Reload attacks, we propose a copy-on-access mechanism to manage physical pages
shared across mutually distrusting security domains (i.e., processes, containers2, or virtual ma-
chines (VMs)). Specifically, temporally proximate accesses to the same physical page by multiple
security domains results in the page being copied so that each domain has its own copy. In this
way, a victim’s access to its copy will be invisible to an attacker’s Reload in a Flush+Reload
attack. When accesses are sufficiently spaced in time, the copies can be deduplicated to return the
overall memory footprint to its original size. Second, to defeat Prime+Probe attacks, we design a
mechanism to manage the cacheability of memory pages so as to limit the number of lines per cache
set that an attacker may Probe. In doing so, we limit the attacker’s visibility into the victim’s
demand for memory that maps to that cache set.
Of course, the essential part of defense work is to prove their effectiveness through a reasonable
security evaluation. To do so, we first detail design and implementation in a memory management
subsystem called CacheBar (short for “Cache Barrier”) for the Linux kernel. CacheBar sup-
ports these defenses for security domains represented as Linux containers. That is, copy-on-access
to defend against Flush+Reload attacks makes page copies as needed to isolate temporally




proximate accesses to the same page from different containers. Moreover, memory cacheability is
managed so that the processes in each container are collectively limited in the number of lines per
cache set they can Probe. CacheBar would thus be well-suited for use in platform-as-a-service
(PaaS) clouds that isolate cloud customers in distinct containers. With a concrete implementation,
we design a quantitative empirical evaluation to measure the attacker’s ability to distinguish a
victim’s behavior in a PaaS cloud environment. Our empirical results show that CacheBar effec-
tively restricts the leakage in cache-based side-channel attacks. Besides, we build a formal model
for copy-on-access and use model checking to check potential interference. The checking results
reveal the incompleteness of empirical evaluation, which covers more vulnerable information flows
than rerunning concrete attacks.
3.1 Copy-On-Access
3.1.1 Design
Modern operating systems, in particular Linux OS, often adopt on-demand paging and copy-
on-write mechanisms to reduce the memory footprints of userspace applications. In particular,
copy-on-write enables multiple processes to share the same set of physical memory pages as long as
none of them modify the content. If a process writes to a shared memory page, the write will trigger
a page fault and a subsequent new page allocation so that a private copy of page will be provided
to this process. In addition, memory merging techniques like kernel same-page merging (KSM) [5]
are also used in Linux OS to deduplicate identical memory pages. Memory sharing, however, is
one of the key factors that enable Flush+Reload side-channel attacks. Disabling memory page
sharing entirely will eliminate Flush+Reload side channels but at the cost of much larger memory
footprints and thus inefficient use of physical memory.
CacheBar adopts a design that we call copy-on-access, which dynamically controls the shar-
ing of physical memory pages between security domains. We designate each physical page as being
in exactly one of the following states: unmapped, exclusive, shared, and accessed. An un-
mapped page is a physical page that is not currently in use. An exclusive page is a physical page
that is currently used by exactly one security domain, but may be shared by multiple processes in
that domain. A shared page is a physical page that is shared by multiple security domains, i.e.,
mapped by at least one process of each of the sharing domains, but no process in any domain has
15
Figure 3.1: Copy-On-Access State Transition
accessed this physical page recently. In contrast, an accessed page is a previously shared page
that was recently accessed by a security domain. The state transitions are shown in Fig. 3.1.
An unmapped page can transition to the exclusive state either due to normal page mapping,
or due to copy-on-access when a page is copied into it. Unmapping a physical page for any reason
(e.g., process termination, page swapping) will move an exclusive page back to the unmapped
state. However, mapping the current exclusive page by another security domain will transit it
into the shared state. If all but one domain unmaps this page, it will transition back from the
shared state to the exclusive state, or accessed state to the exclusive state. A page in the
shared state may be shared by more domains and remain in the same state; when any one of the
domains accesses the page, it will transition to the accessed state. An accessed page can stay
that way as long only the same security domain accesses it. If this page is accessed by another
domain, a new physical page will be allocated to make a copy of this one, and the current page will
transition to either exclusive or shared state, depending on the remaining number of domains
mapping this page. The new page will be assigned state exclusive. An accessed page will be
reset to the shared state if it is not accessed for ∆Tacc seconds. This timeout mechanism ensures
that only recently used pages will remain in the accessed state, limiting chances for unnecessary
16
duplication. Page merging may also be triggered by deduplication services in a modern OS (e.g.,
KSM in Linux). This effect is reflected by a dashed line in Fig. 3.1 from state exclusive to
shared. A page at any of the mapped states (i.e., exclusive, shared, accessed) can transition
to unmapped state for the same reason when it is a copy of another page (not shown in the figure).
Merging duplicated pages requires some extra bookkeeping. When a page transitions from
unmapped to exclusive due to copy-on-access, the original page is tracked by the new copy so
that CacheBar knows with which page to merge it when deduplicating. If the original page is
unmapped first, then one of its copies will be designated as the new “original” page, with which
other copies will be merged in the future. The interaction between copy-on-access and existing
copy-on-write mechanisms is also implicitly depicted in Fig. 3.1: Upon copy-on-write, the triggering
process will first unmap the physical page, possibly inducing a state transition (from shared to
exclusive). The state of the newly mapped physical page is maintained separately.
3.1.2 Implementation
At the core of copy-on-access implementation is the state machine depicted in Fig. 3.1.
unmapped⇔ exclusive⇔ shared Conventional Linux kernels maintain the relationship between
processes and the physical pages they use. However, CacheBar also needs to keep track of
the relationship between containers and the physical pages that each container’s processes use.
Therefore, CacheBar incorporates a new data structure, counter, which is conceptually a table
used for recording, for each physical page, the number of processes in each container that have page
table entrys (page table entries) mapped to this page.
The counter data structure is updated and referenced in multiple places in the kernel. Specifi-
cally, in CacheBar we instrumented every update of mapcount, a data field in the page structure
for counting PTE mappings, so that every time the kernel tracks the PTE mappings of a physical
page, counter is updated accordingly. The use of counter greatly simplifies maintaining and de-
termining the state of a physical page: (1) Given a container, access to a single cell suffices to check
whether a physical page is already mapped in the container. This operation is very commonly used
to decide if a state transition is required when a page is mapped by a process. Without counter,
such an operation requires performing reverse mappings to check the domain of each mapping. (2)
Given a physical page, it takes N accesses to counter, where N is the total number of containers,
17
Figure 3.2: Structure of copy-on-access page lists.
to tell which containers have mapped to this page. This operation is commonly used to determine
the state of a physical page.
shared ⇒ accessed To differentiate shared and accessed states, one additional data field,
owner, is added (see Fig. 3.2) to indicate the owner of the page (a pointer to a PID namespace
structure). When the page is in the shared state, its owner is NULL; otherwise it points to the
container that last accessed it.
All PTEs pointing to a shared physical page will have a reserved Copy-On-Access (COA) bit
set. Therefore, any access to these virtual pages will induce a page fault. When a page fault is
triggered, CacheBar checks if the page is present in physical memory; if so, and if the physical
page is in the shared state, the COA bit of the current PTE for this page will be cleared so that
additional accesses to this physical page from the current process will be allowed without page
faults. The physical page will also transition to the accessed state.
accessed ⇒ exclusive/shared If the page is already in the accessed state when a domain other
than the owner accesses it, the page fault handler will allocate a new physical page, copy the
content of the original page into the new page, and change the PTEs in the accessing container
so that they point to the new page. Since multiple same-content copies in one domain burdens
both performance and memory but contributes nothing for security, the fault handler will reuse a
copy belonging to that domain if it exists. After copy-on-access, the original page can either be
exclusive or shared. All copy pages are anonymous-mapped, since only a single file-mapped
page for the same file section is allowed.
A transition from the accessed state to shared or exclusive state can also be triggered
18
by a timeout mechanism. CacheBar implements a periodic timer (every ∆Tacc = 1s). Upon
timer expiration, all physical pages in the accessed state that were not accessed during this
∆Tacc interval will be reset to the shared state by clearing its owner field, so that pages that are
infrequently accessed are less likely to trigger copy-on-access. If an accessed page is found for
which its counter shows the number of domains mapped to it is 1, then the daemon instead clears
the COA bit of all PTEs for that page and marks the page exclusive.
Instead of keeping a list of accessed pages, CacheBar maintains a list of pages that are in
either shared or accessed state, denoted original list (shown in Fig. 3.2). Each node in the
list also maintains a list of copies of the page it represents, dubbed copy list. These lists are
attached onto the struct page through track ptr. Whenever a copy is made from the page upon
copy-on-access, it is inserted into the copy list of the original page. Whenever a physical page
transitions to the unmapped state, it is removed from whichever of original list or copy list
it is contained in. In the former case, CacheBar will designate a copy page of the original page
as the new original page and adjust the lists accordingly.
For security reasons that will be explained in Sec. 3.3.1(a) , we further require flushing the
entire memory page out of the cache after transitioning a page from the accessed state to the
shared state due to this timeout mechanism. This page-flushing procedure is implemented by
issuing clflush on each of the memory blocks of any virtual page that maps to this physical page.
State transition upon clflush The clflush instruction is subject to the same permission checks
as a memory load, will trigger the same page faults, and will similarly set the ACCESSED bit in
the PTE of its argument [49]. As such, each Flush via clflush triggers the same transitions
(e.g., from shared to accessed, and from accessed to an exclusive copy) as a Reload in our
implementation, meaning that this defense is equally effective against both Flush+Reload and
Flush+Flush [47] attacks.
Page deduplication To mitigate the impact of copy-on-access on the size of memory, CacheBar
implements a less frequent timer (every ∆Tcopy = 10 × ∆Tacc seconds) to periodically merge the
page copies with their original pages. Within the timer interrupt handler, original list and
each copy list are traversed similarly to the “accessed⇒ shared” transition description above,
though the ACCESSED bit in the PTEs of only pages that are in the exclusive state are checked.
19
If a copy page has not been accessed since the last such check (i.e., the ACCESSED bit is unset in
all PTEs pointing to it), it will be merged with its original page (the head of the copy list). The
ACCESSED bit in the PTEs will be cleared afterwards.
When merging two pages, if the original page is anonymous-mapped, then the copy page can
be merged by simply updating all PTEs pointing to the copy page to instead point to the original
page, and then updating the original page’s reverse mappings to include these PTEs. If the original
page is file-mapped, then merging is more intricate, additionally involving the creation of a new
virtual memory area (vma structure) that maps to the original page’s file position and using this
structure to replace the virtual memory area of the (anonymous) copy page in the relevant task
structure.
For security reasons, merging of two pages requires flushing the original physical page from the
LLC. We will elaborate on this point in Sec. 3.3.1(a) .
Interacting with KSM Page deduplication can also be triggered by existing memory dedupli-
cation mechanisms (e.g., KSM). To maintain the state of physical pages, CacheBar instruments
every reference to mapcount within KSM and updates counter accordingly. KSM is capable
of merging more pages than our built-in page deduplication mechanisms. However, CacheBar
still relies on the built-in page deduplication mechanisms for several reasons. First, KSM can
merge only anonymous-mapped pages, while CacheBar needs to frequently merge an anonymous-
mapped page (a copy) with a file-mapped page (the original). Second, KSM may not be enabled
in certain settings, which will lead to ever growing copy lists. Third, KSM must compare page
contents byte-by-byte before merging two pages, whereas CacheBar deduplicates pages on the
same copy list, avoiding the expensive page content comparison.
3.2 Cacheability Management
A potentially effective countermeasure to Prime+Probe attacks is to remove the attacker’s
ability to Prime and Probe the whole cache set and to predict how a victim’s demand for that
set will be reflected in the number of evictions from that set.
3.2.1 Design
Suppose a w-way set associative LLC, so that each cache set has w lines. Let x be the number of
cache lines in one set that the attacker observes having been evicted in a Prime+Probe interval
20
Figure 3.3: A cacheable queue for one page color in a domain: (a) access to page 24 brings it into
the queue and clears NC bit (“← 0”) in the PTE triggering the fault; periodically, (b) a daemon
counts the ACCESSED bits (“+0”, “+1”) per page and (c) reorders pages accordingly; to make
room for a new page, (d) NC bits in PTE pointing to the least recently used page are set, and the
page is removed from the queue.
(i.e., x ∈ VarsO). The Prime+Probe attack is effective today because x is typically a good
indicator of the demand d that the victim security domain had for memory that mapped to that
cache set during the Prime+Probe interval (i.e., d ∈ VarsS). In particular, if the attacker Primes
and Probes all w lines, then it can often observe the victim’s demand d exactly, unless d > w (in
which case the attacker learns at least d ≥ w).
Here we propose to periodically and probabilistically reconfigure the budget ki of lines per cache
set that the security domain i can occupy. After such a reconfiguration, the attacker’s view of the
victim’s demand d is clouded by the following three effects. First, if the attacker is allotted a budget
ka < w, then the attacker will be unable to observe any evictions at all (i.e., x = 0) if d < w− ka.
3
Second, if the victim is given allotment kv, then any two victim demands d, d
′ satisfying d > d′ ≥ kv
will be indistinguishable to the attacker. Third, the probabilistic assignment of kv results in extra
ambiguity for the attacker, since x evictions might reflect the demand d or the budget kv, since
x ≤ min{d, kv} (if all x evictions are caused by the victim).
To enforce the budget ki of lines that security domain i can use in a given cache set, CacheBar
maintains for each cache set a queue per security domain that records which memory blocks are
3This statement assumes a LRU replacement policy and that the victim is the only security domain that runs in
the Prime+Probe interval. If it was not the only security domain to run, then the ambiguity of the observable
evictions will additionally cause difficulties for the attacker.
21
presently cacheable in this set by processes in this domain. Each element in the queue indicates a
memory block that maps to this cache set; only blocks listed in the queue can be cached in that
set. The queue is maintained with a least recently used (LRU) replacement algorithm. That is,
whenever a new memory block is accessed, it will replace the memory block in the corresponding
queue that is the least recently used.
3.2.2 Dynamic budget ki of cache lines
Suppose there are (at most) m domains on a host that are owned by the attacker—which might
be all domains on the host except the victim—and let w be the number of cache lines per LLC set.
Below we consider domain 0 to be the “victim” domain being subjected to Prime+Probe attacks
by the “attacker” domains 1, . . . ,m. Of course, the attacker domains make use of all
∑m
i=1 ki cache
lines available to them for conducting their Prime+Probe attacks.
Periodically, CacheBar draws a new value ki for each security domain i. This drawing is
memoryless and independent of the draws for other security domains. Let Ki denote the random
variable distributed according to how ki is determined. The random variables that we presume
can be observed by the attacker domains include K1, . . . ,Km; let Ka =min {w,
∑m
i=1 Ki} denote
the number of cache lines allocated to the attacker domains. We also presume the attacker can
accurately measure the number X of its cache lines that are evicted during the victim’s execution.
Let Pd (E) denote the probability of event E in an execution period during which the victim’s
cache usage would populate d lines (of this color) if it were allowed to use all w lines, i.e., if k0 = w.
We (the defender) would like to distribute K0, . . . ,Km so as to minimize the statistical distance






|Pd (X = x)− Pd′ (X = x) | (3.1)
We begin by deriving an expression for Pd (X = x). Below we make the conservative assumption
that all evictions are caused by the victim’s behavior; in reality, caches are far noisier. We first
22














1 if w ≥ ka +min{k0, d}
0 otherwise
“min{k0, d}” is used above because any victim demand for memory blocks that map to this cache
set beyond k0 will back-fill the cache lines invalidated when CacheBar flushes other blocks from
the victim’s cacheability queue, rather than evicting others. Since K0 and Ka are independent,














P (K0 = k0) · P (Ka = ka) (3.2)
Note that we have dropped the “d” subscripts from the probabilities on the right, since K0 and Ka
























P (Ki = ki) if ka=w
(3.3)















1 if x+w = ka+min{k0, d}
0 otherwise
and so for x ≥ 1,








P (K0 = k0) · P (Ka = x+w−d) (3.4)
From here, we proceed to solve for the best distribution forK0, . . . ,Km to minimize (3.1) subject
23
to constraints (3.2)–(3.4). That is, we specify those constraints, along with





P (Ki = ki) = 1 (3.6)
∀i, ki : P (Ki = ki) ≥ 0 (3.7)
and then solve for each P (Ki = ki) to minimize (3.1).
Unfortunately, solving to minimize (3.1) alone simply results in a distribution that results in
no use of the cache at all (e.g., P (Ki = 0) = 1 for each i). As such, we need to rule out such
degenerate and “unfair” cases:
∀i : P (Ki < w/(m + 1)) = 0 (3.8)
Also, to encourage cache usage, we counterbalance (3.1) with a second goal that values greater
use of the cache. We express this goal as minimizing the earth mover’s distance [35] from the




(w − k) · P (K0 = k) (3.9)
As such, the final optimization problem seeks to balance (3.1) and (3.9). Let constant ϕ denote
the maximum (i.e., worst) possible value of (3.1) (i.e., when P (Ki = w) = 1 for each i) and α
denote the maximum (i.e., worst) possible value of (3.9) (i.e., when P (Ki = 0) = 1 for each i).
Then, given a parameter ρ, 0 < ρ < 1, our optimization computes distributions for K0, . . . ,Km so


























The evaluation in Sec. 3.3.2 empirically characterizes the security that result from setting ρ =
0.01 the default setting in CacheBar.
3.2.3 Implementation
Implementation of cacheable queues is processor micro-architecture dependent. Here we focus
our attention on Intel x86 processors, which appears to be more vulnerable to Prime+Probe
attacks due to their inclusive last-level cache [68]. As x86 architectures only support memory man-
agement at the page granularity (e.g., by manipulating the PTEs to cause page faults), CacheBar
controls the cacheability of memory blocks at page granularity. CacheBar uses reserved bits in
each PTE to manage the cacheability of, and to track accesses to, the physical page to which it
points, since a reserved bit set in a PTE induces a page fault upon access to the associated virtual
page, for which the backing physical page cannot be retrieved or cached (if it is not already) before
the bit is cleared [49, 80]. We hence use the term domain-cacheable to refer to a physical page that
is “cacheable” in the view of all processes in a particular security domain, which is implemented by
modifying all relevant PTEs (to have no reserved bits set) in the processes of that security domain.
By definition, a physical page that is domain-cacheable to one container may not necessarily be
domain-cacheable to another.
To ensure that no more than ki memory blocks from all processes in container i can occupy lines
in a given cache set, CacheBar ensures that no more than ki of those processes’ physical memory
pages, of which contents can be stored in that cache set, are domain-cacheable at any point in time.
Physical memory pages of which contents can be stored in the same cache set are said to be of the
same color, and so to implement this property, CacheBar maintains, per container and per color
(rather than per cache set), one cacheable queue, each element of which is a physical memory page
that is domain-cacheable in this container. Since the memory blocks in each physical page map to
different cache sets, limiting the domain-cacheable pages of a color to ki also limits the number of
cache lines that blocks from these pages can occupy in the same cache set to ki.
To implement a non-domain-cacheable memory, CacheBar uses one reserved bit, which we
denote by NC, in all PTEs within the domain mapped to that physical page. As such, accesses to
any of these virtual pages will be trapped into the kernel and handled by the page fault handler.
Upon detecting page faults of this type, the page fault handler will move the accessed physical page
25
Figure 3.4: Page fault handler for CacheBar.
into the corresponding cacheable queue, clear the NC bit in the current PTE4, and remove a least
recently used physical page from the cacheable queue and set the NC bits in this domain’s PTEs
mapped to that page. A physical page removed from the cacheable queue will be flushed out of the
cache using clflush instructions on all of its memory blocks to ensure that no residue remains in
the cache. CacheBar will flush the translation lookaside buffers (TLB) of all processors to ensure
the correctness of page cacheabilities every time PTEs are altered. In this way, CacheBar limits
the number of domain-cacheable pages of a single color at any time to ki.
To maintain the LRU property of the cacheable queue, a daemon periodically re-sorts the queue
in descending order of recent access count. Specifically, the daemon traverses the domain’s page
table entries mapped to the physical frame within that domain’s queue and counts the number
having their ACCESSED bit set, after which it clears these ACCESSED bits. It then orders the
physical pages in the cacheable queue by this count (see Fig. 3.3). In our present implementation,
this daemon is the same daemon that resets pages from the accessed state to shared state
(see Sec. 3.1), which already checks and resets the ACCESSED bits in copies’ PTEs. Again, this
daemon runs every ∆Tacc = 1s seconds in our implementation. This daemon also performs the task
of resetting ki for each security domain i, each time it runs.
4We avoid the overhead of traversing all PTEs in the container that map to this physical page. Access to those
virtual pages will trigger page faults to make these updates without altering the cacheable queue.
26
Interacting with copy-on-access The cacheable queues work closely with the copy-on-access
mechanisms. In particular, as both the COA and NC bits may trigger a page fault upon page
accesses, the page handler logic must incorporate both (see Fig. 3.4). First, a page fault is handled
as normal unless it is due to one of the reserved bits set in the PTE. As CacheBar is the only
source of reserved bits, it takes over page fault handling from this point. CacheBar first checks
the COA bit in the PTE. If it is set, the corresponding physical page is either shared, in which case
it will be transitioned to accessed, or accessed, in which case it will be copied and transitioned
to either shared or exclusive. CacheBar then clears the COA bit and, if no other reserved bits
are set, the fault handler returns. Otherwise, if the NC bit is set, the associated physical page is
not in the cacheable queue for its domain, and so CacheBar enqueues the page and, if the queue
is full, removes the least-recently-used page from the queue. If the NC bit is clear, this page fault is
caused by unknown reasons and CacheBar turns control over to the generic handler for reserved
bits.
3.3 Security Evaluation
In this section, We empirically evaluated the effectiveness of CacheBar in defending against
both Flush+Reload and Prime+Probe attacks.
Our testbed is a rack mounted DELL server equipped with two 2.67GHz Intel Xeon 5550
processors. Each processor contains 4 physical cores (hyperthreading disabled) sharing an 8MB
last-level cache (L3). Each core has a 32KB L1 data and instruction cache and a 256KB L2 unified
cache. The rack server is equipped with 128GB DRAM and 1000MB NIC connected to a 1000MB
ethernet.
We implemented CacheBar as a kernel extension for Linux kernel 3.13.11.6 that runs Ubuntu
14.04 server edition. We set up containers using Docker 1.7.1.
3.3.1 Flush+Reload attacks
We constructed a Flush+Reload covert channel between sender and receiver processes, which
were isolated in different containers. Both the sender and receiver were linked to a shared library,
libcrypto.so.1.0.0, and were pinned to run on different cores of the same socket, thus sharing
the same last-level cache. The sender ran in a loop, repeatedly accessing one memory location (the
beginning address of function AES decrypt()). The receiver executed Flush+Reload attacks






























Figure 3.5: Reload timings in Flush+Reload attacks on a shared address vs. on an unshared
address
clflush instruction and then Reloading the block by accessing it directly while measuring the
access latency. The interval between Flush and Reload was set to 2500 cycles. The experiment
was run for 500,000 Flush+Reload trials. We then repeated this experiment with the sender
accessing an unshared address, to form a baseline.
Fig. 3.5(a) shows the results of this experiment, when run over unmodified Linux. The three
horizontal lines forming the “box” in each boxplot represents the first, second (median), and third
quartiles of the Flush+Reload measurements; whiskers extend to cover all points that lie within
1.5× the interquartile range. As can be seen in this figure, the times observed by the receiver
to Reload the shared address were clearly separable from the times to Reload the unshared
address, over unmodified Linux. With CacheBar enabled, however, these measurements are no
longer separable (Fig. 3.5(b)). Certain corner cases are not represented in Fig. 3.5. For example,
we found it extremely difficult to conduct experiments to capture the corner cases where Flush
and Reload takes place right before and after physical page mergers, as described in Sec. 3.3.1(a) .
As such, we rely on our manual inspection of the implementation in these cases to check correctness
and argue these corner cases are very difficult to exploit in practice.
3.3.1(a) Model Checking Noninterference Copy-on-access is intuitively secure by design,
as no two security domains may access the same physical page at the same time, rendering a
general Flush +Reload attack seemingly impossible, as demonstrated in previous section. To
show security formally, we subjected our design to model checking in order to ensure that copy-on-
access is secure against Flush+Reload attacks at every execution point. Model checking is an
approach to formally verify a specification of a finite-state concurrent system expressed as temporal
28
logic formulas, by traversing the finite-state machine defined by the model. In our study, we used
the Spinmodel checker, which offers efficient ways to model concurrent systems and verify temporal
logic specifications.
System modeling We model a physical page in Fig. 3.1 using a byte variable in the Promela
programming language, and two physical pages as an array of two such variables, named pages.
We model two security domains (e.g., containers), an attacker domain and a victim domain, as
two processes in Promela. Each process maps a virtual page, virt, to one of the physical pages.
The virtual page is modeled as an index to the pages array; initially virt for both the attacker
and the victim point to the first physical page (i.e., virt is 0). The victim process repeatedly sets
pages[virt] to 1, simulating a memory access that brings pages[virt] into cache. The attacker
process Flushes the virtual page by assigning 0 to pages[virt] and Reloads it by assigning 1 to
pages[virt] after testing if it already equals to 1. Both the Flush and Reload operations are
modeled as atomic to simplify the state exploration.
We track the state and owner of the first physical page using another two variables, state and
owner. The first page is initially in the shared state (state is shared), and state transitions in
Fig. 3.1 are implemented by each process when they access the memory. For example, the Reload
code snippet run by the attacker is shown in Fig. 3.6. If the attacker has access to the shared
page (Line 3), versus an exclusive copy (Line 16), then it simulates an access to the page, which
either moves the state of the page to accessed (Line 10) if the state was shared (Line 9) or to
exclusive (Line 14) after making a copy (Line 13) if the state was already accessed and not
owned by the attacker (Line 12). Leakage is detected if pages[virt] is 1 prior to the attacker
setting it as such (Line 19), which the attacker tests in Line 18.
To model the dashed lines in Fig. 3.1, we implemented another process, called timer, in Promela
that periodically transitions the physical page back to shared state from accessed state, and peri-
odically with a longer interval, merges the two pages by changing the value of virt of each domain
back to 0, owner to none, and state to shared.
The security specification is stated as a noninterference property. Specifically, as the attacker
domain always Flushes the memory block (sets pages[virt] to 0) before Reloading it (setting




3 ::( virt == 0)→
4 i f
5 ::( state == UNMAPPED ) →
6 assert(0)
7 ::( state == EXCLUSIVE && owner != ATTACKER ) →
8 assert(0)
9 ::( state == SHARED ) →
10 state = ACCESSED
11 owner = ATTACKER
12 ::( state == ACCESSED && owner != ATTACKER ) →
13 virt = 1 /* copy -on-access */
14 state = EXCLUSIVE
15 f i
16 :: else → skip
17 f i
18 assert(pages [virt] == 0)
19 pages[virt] = 1
20 }
Figure 3.6: Code snippet for Reload.
pages[virt] to be 0 upon Reloading the page. The model checker checks for violation of this
property.
Automated verification We checked the model using Spin. Interestingly, our first checking
attempt suggested that the state transitions may leak information to a Flush+Reload attacker.
The leaks were caused by the timer process that periodically transitions the model to a shared
state. After inspecting the design and implementation, we found that there were two situations
that may cause information leaks. In the first case, when the timer transitions the state machine
to the shared state from the accessed state, if the prior owner of the page was the victim and
the attacker reloaded the memory right after the transition, the attacker may learn one bit of
information. In the second case, when the physical page was merged with its copy, if the owner of
the page was the victim before the page became shared, the attacker may reload it and again learn
one bit of information. Since in our implementation of CacheBar, these two state transitions are
triggered if the page (or its copy) has not been accessed for a while (roughly ∆Tacc and ∆Tcopy
seconds, respectively), the information leakage bandwidth due to each would be approximately
1/∆Tacc bits per page per second or 1/∆Tcopy bits per page per second, respectively.
We improved our CacheBar implementation to prevent this leakage by enforcing LLC flushes
(as described in Sec. 3.1.2) upon these two periodic state transitions. We adapted our model
accordingly to reflect such changes by adding one more instruction to assign pages[0] to be 0




We evaluated the effectiveness of CacheBar against Prime+Probe attacks by measuring
its ability to interfere with a simulated attack. Because the machine architecture on which we
performed these tests had a w-way LLC with w = 16, we limited our experiments to only a single
attacker container (i.e., m = 1), but an architecture with a larger w could accommodate more.5
In our simulation, a process in the attacker container repeatedly performed Prime+Probe
attacks on a specific cache set, while a process in a victim container accessed data that were
retrieved into the same cache set at the rate of d accesses per attacker Prime+Probe interval.
The cache lines available to the victim container and attacker container, i.e., kv and ka respectively,
were fixed in each experiment. The calculations in Sec. 3.2.2 implied that kv and ka could take
on values from {4, 5, 6, . . . , 14}. In each test with fixed kv and ka, we allowed the victim to place
a demand of (i.e., retrieve memory blocks to fill) d ∈ {0, 1, 2, ..., 16} cache lines of the cache set
undergoing the Prime+Probe attack by the attacker. The attacker’s goal was to classify the
victim’s demand into one of six classes: none = {0}, one = {1}, few = {2, 3, 4}, some =
{5, 6, 7, 8}, lots = {9, 10, 11, 12}, and most = {13, 14, 15, 16}.
To make the attack easier, we permitted the attacker to know ka; i.e., the attacker trained a
different classifier per value of ka, with knowledge of the demand d per Prime+Probe trial, and
then tested against additional trial results to classify unknown victim demands. Specifically, after
training a näıve Bayes classifier on 500,000 Prime+Probe trials per (d, ka, kv) triple, we tested
it on another 500,000 trials. To filter out Probe readings due to page faults, excessively large
readings were discarded from our evaluation. The tests without CacheBar yielded the confusion
matrix in Table 3.7(a), with overall accuracy of 67.5%. In this table, cells with higher numbers
have lighter backgrounds, and so the best attacker would be one who achieves white cells along
the diagonal and dark-gray cells elsewhere. As can be seen there, classification by the attacker was
very accurate for d falling into none, one, or lots; e.g., d = 1 resulted in a classification of one
5For example, on an Itanium 2 processor with a 64-way LLC, CacheBar could accommodate m = 3 or larger. That
said, we are unaware of prior works that have successfully conducted Prime+Probe attacks from multiple colluding
attackers, which would itself face numerous challenges (e.g., coordinating Probes by multiple processes).
31
Classification by attacker










d none .96 .04 .00 .00 .00 .00
one .01 .80 .19 .01 .00 .00
few .00 .16 .50 .30 .04 .00
some .00 .00 .07 .54 .34 .04
lots .00 .00 .00 .03 .84 .13
most .00 .00 .00 .03 .56 .41
(a) Without CacheBar
Classification by attacker










d none .33 .16 .26 .18 .04 .02
one .16 .36 .19 .19 .06 .04
few .13 .14 .40 .19 .09 .05
some .09 .10 .16 .37 .20 .07
lots .08 .06 .10 .16 .46 .13
most .10 .07 .18 .18 .18 .29
(b) With CacheBar
Figure 3.7: Confusion matrix of näıve Bayes classifier
with probability of 0.80. Other demands had lower accuracy, but were almost always classified into
adjacent classes; i.e., every class of victim demand was classified correctly or as an adjacent class
(e.g., d ∈ few was classified as one, few, or some) at least 96% of the time.
In contrast, Fig. 3.7(b) shows the confusion matrix for a näıve Bayes classifier trained and tested




















d ∈ c′ ∧Kv = kv ∧ Ka=ka
)




where class denotes the classification obtained by the adversary using the näıve Bayes classifier;
c, c′ ∈ {none, one, few, some, lots, most}; and P (Ka = ka) and P (Kv = kv) are calculated as





∣ d ∈ c′ ∧Kv = kv ∧Ka = ka
)
was measured empirically. Though space limits preclude
reporting the full class confusion matrix for each kv, ka pair, the accuracy of the näıve Bayes
classifier per kv, ka pair, averaged over all classes c, is shown in Fig. 3.8. As in Fig. 3.7, cells with
larger values in Fig. 3.8 are more lightly colored, though in this case, the diagonal has no particular
significance. The design intuitively assumes when the attacker and victim are each limited to fewer
lines in the cache set (i.e., small values of ka and kv, in the upper left-hand corner of Fig. 3.8) the
accuracy of the attacker will suffer, whereas when the attacker and victim are permitted to use
more lines of the cache (i.e., in the lower right-hand corner) the attacker’s accuracy would improve.
Fig. 3.8 supports these general trends.
32
kv
4 5 6 7 8 9 10 11 12 13 14
k
a
4 .18 .17 .17 .17 .17 .17 .17 .17 .36 .22 .33
5 .19 .17 .30 .32 .27 .27 .20 .26 .33 .46 .39
6 .17 .31 .24 .18 .21 .17 .20 .27 .43 .39 .41
7 .17 .33 .22 .22 .19 .31 .33 .33 .46 .48 .54
8 .33 .35 .32 .23 .43 .37 .43 .42 .32 .38 .49
9 .20 .26 .31 .28 .44 .38 .34 .34 .46 .39 .56
10 .41 .31 .27 .35 .50 .55 .53 .31 .53 .50 .62
11 .45 .45 .40 .45 .47 .54 .54 .57 .67 .50 .50
12 .55 .50 .59 .63 .49 .48 .54 .49 .56 .58 .57
13 .55 .53 .68 .68 .54 .65 .52 .56 .57 .66 .66
14 .53 .56 .45 .65 .46 .62 .48 .68 .55 .57 .53
Figure 3.8: Accuracy per values of kv and ka
Fig. 3.7(b) shows that CacheBar substantially degrades the adversary’s classification accuracy,
which overall is only 33%. Moreover, the adversary is not only wrong more often, but is also often
“more wrong” in those cases. That is, whereas in Fig. 3.7(a) shows that each class of victim demand
was classified as that demand or an adjacent demand at least 96% of the time, this property no
longer holds true in Fig. 3.7(b). Indeed, the attacker’s best case in this regard is classifying victim
demand lots, which it classifies as some, lots, or most 75% of the time. In the case of a victim
demand of most, this number is only 47%.
3.4 Summary
This chapter presented two techniques to defend against side-channel attacks via LLCs, namely
(i) copy-on-access for physical pages shared among multiple security domains, to interfere with
Flush+Reload attacks, and (ii) cacheability management for pages to limit the number of cache
lines per cache set that an adversary can occupy simultaneously, to mitigate Prime+Probe at-
tacks. Using formal analysis (model checking for copy-on-access, and probabilistic modeling for
cacheability management), we developed designs that mitigate side-channel attacks in our empiri-
cal evaluations. We also learned a lesson that the experiment-based leakage measure covers fewer
leaks than a static analysis because its empirical data is limited to concrete attacks.
33
CHAPTER 4: STATIC ANALYSIS OF QUANTITATIVE NONINTERFERENCE1
The quantitative noninterference evaluation for cache mitigation in the previous Chapter 3 is
attack-specific. In this chapter, we propose a static method to measure interference in software
using static analysis before it happens.
Our intuition draws from noninterference [44], which informally is achieved when the attacker-
controlled inputs and attacker-observable outputs are unchanged by the value of a secret input that
should not “interfere” with what the attacker can observe. In principle, for any secret S, we could
build pairs of all attacker-controlled inputs C with the attacker-observable O outputs that can
possibly result from assignments I(ivars) – we’ll call these the 〈C,O〉 pairs for S. If secrets S and S′
have different sets of 〈C,O〉 pairs then there will be inputs that reveal interference. Unfortunately,
for complex procedures, it is impractical to enumerate all 〈C,O〉 pairs for all possible secrets, so
previous explorations based on similar enumerating have been limited (See Sec. 2.3).
By leveraging techniques from approximate model counting [15], we show how to scalably esti-
mate the number of 〈C,O〉 pairs to a desired accuracy and confidence and—perhaps more to the
point—the number of 〈C,O〉 pairs that are consistent with one or both of two disjoint spaces of
secret values. Finding two spaces of secret values for which these counts suggest pairs consistent
with one but not both then reveals interference. Moreover, we will demonstrate the need to examine
samples of secrets of varying sizes, and show that small samples provide a more reliable indication
of the number of secret values about which information leaks, whereas larger samples provide more
insight into the amount of leakage of secret values. In doing so, we develop a powerful framework
for interference detection and assessment with the following strengths:
• The error in our assessment of a reported interference can be reduced, arbitrarily close to zero
in the limit, through greater computational investment. Specifically, by increasing the accuracy
1This chapter is excerpted from previously published work [100] coauthored with Ziyun Qian, Michael K. Reiter, and
Yinqian Zhang.
34
and confidence with which the number of pairs of 〈C,O〉 consistent with sampled secrets are esti-
mated, and by increasing the number and variety of samples tested, the interference assessment
quantifiably improves.
• Our framework supports the derivation of values from its estimates that separately provide insight
into the number of secret values about which information leaks, and the amount of leakage about
those secrets. Within the context of particular applications, one type of leakage might be more
important than the other.
• Even for nondeterministic applications, our framework provides a robust assessment of noninter-
ference, by accounting for the nondeterministic factors (e.g., procedure inputs I other than the
secrets or attacker-controlled values).
We demonstrate our tool through its application in numerous scenarios. We first apply it to
selected, artificially small examples (microbenchmarks) to demonstrate its features. Then, we apply
it to assess leakage in several real-world examples.
• We apply our tool to detect leakage of web search query strings submitted to the Sphinx web
server on the basis of auto-complete response sizes returned to the client (i.e., even if the query
and response contents themselves are encrypted) [18]. We also leverage our tool to evaluate the
impact of various mitigation strategies on this leak, e.g., showing that based on the contents of
the searchable database, some seemingly stronger defenses offer little additional protection over
seemingly weaker ones.
• We use our tool to demonstrate the vulnerability leveraged in Compression Ratio Info-leak Made
Easy (CRIME) attacks [53], specifically that adaptive compression algorithms provide opportu-
nities for an attacker to test guesses about secrets that he cannot observe, if he can instead
observe the length of compressed strings containing both the secret and his guess. This case
study demonstrates the ability of our technique to effectively account for attacker-controlled
inputs, in contrast to many prior techniques (see Sec. 2.3). Specifically, we apply our tool to
both Gzip and the fixed-dictionary compression library Smaz to illustrate that they both leak
information about secrets to the adversary, but that Gzip leaks more information as the number
of adversary-controlled executions grows.
• We apply our tool to illustrate the tendency of Linux to leak TCP-session sequence numbers to
an off-path attacker [72, 79]. This is perhaps the most complex of the examples we consider, and
35
again illustrates the power of accounting for attacker-controlled variables. Moreover, we evaluate
two plausible defenses against this attack, one a hypothetical patch to Linux that we propose,
and another being simply to disable use of information that is central to the leak.
This chapter first presents the methodology for interference measurement in Sec. 4.1. The
implementation of our tool is described in Sec. 4.2. We then use microbenchmarks in Sec. 4.3 to
demonstrate features of our approach, and apply our tool to real-world codebases in Sec. 4.4. Some
limitations of our approach are discussed in Sec. 4.5.
4.1 Quantitative Noninterference
To measure the leakage about ‘secret’ from O, under the adversary’s chosen C, we consider the





∣ Πproc(C, I,S,O) ∧ S(‘secret’) = s
}
Ys = {〈C,O〉 | ∃I : 〈C,O, I〉 ∈ Xs }
Ys is an indicator of how s influences the possible view of the adversary. For example, if
O is independent of ‘secret’ and so leaks nothing about the value of ‘secret’, regardless of how
the adversary chooses C, then Ys = Ys′ for any s, s
′ ∈ S. To generalize from this example, let
YS =
⋃
s∈S Ys and then consider the Jaccard distance of YS and YS′ for any two disjoint sets
S, S′ ⊆ S:
J(S, S′) =
∣

















(By convention, J(S, S′) = 0 if YS = YS′ = ∅.) On the one hand, J(S, S
′) = 0 implies that YS = YS′
or, in other words, that an attacker cannot distinguish whether the secret S(‘secret’) is in S or
S′. On the other hand, J(S, S′) > 0 implies there is some 〈C,O〉 ∈ (YS \ YS′) ∪ (YS′ \ YS), and
so the attacker can potentially distinguish between ‘secret’ having a value in S and the case in
which it has a value in S′. J(S, S′) is an aggregate measure of leakage considering all possible
attacker-controlled input values, instead of a worst-case measure of interference caused by specific
attacker-controlled inputs.
Unfortunately, it is generally infeasible to compute J(S, S′) for every disjoint pair S, S′ ⊆ S, or
36
even when S, S′ are restricted to being singleton sets. We can, however, estimate
Jn = avg
S, S′ :|S| = |S′| = n
∧ S ∩ S′ = ∅
J(S, S′) (4.2)
to a high level of confidence by sampling disjoint sets S, S′ of size n (or of expected size n, as we
will discuss in Sec. 4.2.2) at random and computing J(S, S′) for each.
Jaccard distance is not the only choice to measure the degree of dissimilarity between sets YS and
YS′ for S and S




















), Tversky index (i.e., a parameterized generalization of the Jaccard similar-



















), etc. Their corresponding
distance metrics are also possible to measure the interference. In this dissertation, we will use the
Jaccard distance to measure the interference between two secret sets.
4.1.1 The need to vary n
Consider an idealized situation in which a procedure leaks the equivalence class into which
S(‘secret’) falls, among a set of c “small” equivalence classes C1, . . .Cc of equal size w. If C =
⋃c
i=1Ci, then the remaining elements C0 = S\C form another, “large” equivalence class (w < |C0|).
Let CsmS ⊆ {C1, . . . ,Cc} denote the small equivalence classes of which S contains elements and
C lgS ⊆ {C0} indicate whether S contains representatives of C0 (in which case C
lg
S = {C0}) or
not (in which case C lgS = {}). For simplicity, we assume below that |YCi | is the same for each
i ∈ {0, 1, . . . , c}.
For the rest of this discussion, we treat the selection of s ∈ S and s ∈ S′ as the selection, with
37







































































































































































































) , using (4.4) and
(4.3) for the numerator and denominator, respectively.









≈ 1− 2nw|S| to (4.3) to conclude
E (Jn) ≈ 2−
2ncw











Thus, when n is small, E (Jn) is sensitive to the number of secrets cw = |C| about which there
is substantial leakage, but is insensitive to c and w individually, i.e., to the amount of leakage
2In reality, each Ci can be selected only w times in the drawing of S and S
′, since S and S′ do not intersect. This
dependence should not affect our estimates much, however, provided that w is not too small or n is small enough.
3Let Xi = 1 if class Ci ∈ C
sm
S and Xi = 0 otherwise. Then, PXi=0 (=) (1−w/ |S|)























about those secrets. As such, small n yields a measure Jn that best indicates the number of
secrets about which information leaks.


























That is, Jn is sensitive to c and w individually when n is large. In this sense, we say that Jn for
large n is a better indicator for the amount of leakage about secrets.
Again, the above model is idealized; leakage from real procedures can be far more complex.
Still, this discussion provides insight into the utility of Jn and how it should be used. When n
is small, (4.5) grows as cw = |C| grows, and for any threshold t ∈ [0, 1] indicating “substantial”
leakage, the smallest n for which Jn ≥ t shrinks. This smallest n is thus a reflection of |C|, i.e.,
of the number of secrets about which information leaks. When n is large and for a fixed cw, (4.6)
grows as w shrinks,4 and for any threshold t ∈ [0, 1] indicating “substantial” leakage, the largest
n for which Jn ≥ t grows. This largest n is thus a reflection of w, i.e., of the amount of leakage
about those secrets. It is therefore natural to examine both min{n|Jn ≥ t} and max{n|Jn ≥ t}. To
define measures using these values that fall within [0, 1] and for which larger values indicate more








1/min{n | Jn ≥ t}










|S|/2 max{n | Jn ≥ t}
if t > Jmax
otherwise
Here, Jmax = maxn′ Jn′ , and so the t > J
max cases accommodate t values larger than Jn ever

























































(b) Varying w with fixed |C| = 228 and |S| = 232
Figure 4.1: Relating ηmin and ηmax to min-entropy and mutual entropy, for the idealized model of
leakage explored in Sec. 4.1.1
values of ηmint and η
max









The numbers we report in this chapter are discrete approximations to these values via numerical
integration with a fixed subinterval width of 0.01.
Roughly speaking, a larger value for ηmin suggests that information leaks from the procedure
for more secret values, and a larger value for ηmax suggests that more information leaks from the
procedure about secret values.5 To relate these measures to another used previously in the QIF
literature, namely min-entropy (e.g., [87, 31]), in Fig. 4.1 we show ηmin and ηmax in comparison
to the min-entropy of S(‘secret’), for our idealized setting above. Fig. 4.1(a) shows that ηmin
reflects the growth of |C| just as min-entropy can, and similarly, Fig. 4.1(b) shows that ηmax
reflects changes in w like min-entropy can. However, min-entropy does not distinguish between
these types of leakage. Mutual entropy (e.g., [23, 59, 69]) also reflects increasing leakage as |C|
grows in Fig. 4.1(a) and as w shrinks in Fig. 4.1(b), though its sensitivity to these effects is limited,
5While these rules of thumb are accurate when Jn has no valley, they are less reliable when it does. In such cases,
a more reliable understanding can be obtained by examining the graph of Jn directly, or at least by computing a
separate ηmin and ηmax for each valley-free segment of Jn. Here, by “valley” we mean values n, n
′ where n < n′,
Jn > Jn+1, Jn′ < Jn′+1, and Jn′′ = Jn′′+1 for each n
′′ ∈ [n + 1, n′ − 1]. We have not encountered Jn curves with
valleys in practice, and so do not discuss them further here.
40
proc (C, I, S)
if (S(‘secret’) = C(‘test’))
O(‘result’)← rand() mod M
else
































(c) Ĵn for various n and M
Figure 4.2: An example showing limitations of J on procedures with randomness and improvements
offered by Ĵ (see Sec. 4.1.2)
particularly that of increasing |C|, until |C| becomes quite large (Fig. 4.1(a)).
4.1.2 Procedures with other inputs
The measures Jn, η
min, and ηmax are appropriate when proc is deterministic and leverages
no inputs in I. When either of these restrictions are lifted, our approach described so far can
be unreliable. We illustrate this in Sec. 4.1.2(a) and then provide an alternative measure in
Sec. 4.1.2(b) that is more robust.
4.1.2(a) Limitations of Jn First consider a randomized password checker that receives a
secret password S(‘secret’) and a candidate password C(‘test’) and, for some constant M > 0,
outputs a random value in [0,M − 1] if the candidate password is equal to the secret password and
random value in [M,M + 16] otherwise. Intuitively, the leakage of this procedure should be the
same as a deterministic password checker and independent of the value of M . However, as shown
in Fig. 4.2, the use of randomness here results in an unintuitive result, since Jn (Fig. 4.2(b)) is
sensitive to the value of M . As such, while our detector does accurately detect leakage in this case,
it provides less help in comparing the leakage of two randomized implementations.
Another problem may arise when other inputs are allowed in I. Consider the example
proc (C, I, S)
O(‘result’)← ((S(‘secret’) > C(‘test’)) ? 1 : 0) ⊕ ((I(‘other’) ≤ 0) ? 1 : 0)
return O
Here, the expression “cond ? 1 : 0” evaluates to 1 if cond is true and 0 otherwise, and “⊕” represents
XOR. This procedure indicates that S(‘secret’) > C(‘test’) by returning 0 if I(‘other’) ≤ 0 or by
returning 1 if I(‘other’) > 0. Because our technique allows for any value of I(‘other’) consistent
41
with Πproc when estimating |YS |, it will compute Jn = 0 for any n, suggesting no leakage. However,
the only condition under which proc in fact leaks no information is if I(‘other’) is non-positive or
positive with equal probability from the adversary’s perspective.
4.1.2(b) An alternative measure To overcome the limitations of Jn as illustrated above,
in this section we propose a leakage measure that is more robust for procedures that employ
randomness or inputs in I. For convenience, here we treat all values generated at random within
the procedure instead as inputs represented in I; e.g., the first invocation of rand() within the
procedure is replaced with a reference to, say, I(‘rand[1]’), the second with I(‘rand[2]’), and so forth.
Intuitively, our measure employs an alternative definition for YS that also includes these additional
inputs. Specifically, consider the set X̂S,S′





∣〈C,O, I〉∈X̌S,S′ ∧ 〈C,O〉∈YS∩YS′
}
of 〈C,O, I〉 triples such that not only is 〈C,O〉 ∈ YS ∩ YS′ (c.f., the definition of J(S, S
′) in
(4.1)), but also the triple is consistent with some s ∈ S (i.e., 〈C,O, I〉 ∈ XS where XS =
⋃
s∈S Xs).
By counting such 〈C,O, I〉 triples, the various random values (represented in I) become exposed in
X̂S,S′ and the number of these values for a given 〈C,O〉 pair act as the “weight” of that pair. When
〈C,O, I〉 is from the following difference set, an attacker can determine whether the secret is from
S or S′.
X̃S,S′ = X̌S,S′ \ X̂S,S′ (4.8)

























S, S′ : |S| = |S′| = n


























Figure 4.3: Workflow of evaluating leakage, from left to right: label the different types of inputs
and outputs; generate postconditions Πproc using symbolic execution; optionally, compose multi-
execution constraints; perform model counting for different sizes of n; and generate our leakage
measures
Note that if VarsI = ∅, then Ĵn = Jn since in this case, 〈C,O〉 ∈ YS if and only if 〈C,O, ∅〉 ∈ XS .
The benefit of Ĵn is that it is far less susceptible to the variability that was demonstrated in
Sec. 4.1.2(a) . For example, Fig. 4.2(c) shows that this measure is stable, independent of M . As
we will see in subsequent sections, however, it is also considerably costlier to estimate.
When we use Ĵn in place of Jn, we will annotate measures derived from it using similar notation.
For example, η̂min denotes ηmin computed using Ĵn in place of Jn, and similarly for η̂
max.
4.2 Implementation
In this section, we discuss our implementation for computing the measures discussed in Sec. ??.
Fig. 4.3 shows the overall workflow for doing so. Sec. 4.2.1 illustrates the use of symbolic execution
(e.g., [12, 20]) for generating postconditions, with a focus on a particular optimization that proved
useful for our case studies in Sec. 4.4. At the core of our implementation is the adaption of hash-
based model counting technique that is discussed in Sec. 4.2.2–4.2.4. In Sec. 4.2.5, we present an
adaptation for generating logical postconditions for multiple rounds of procedure executions.
4.2.1 From software procedure to logical postcondition
As mentioned in Sec. 4.1, the logical postcondition Πproc represents the relationship between
inputs and outputs induced by procedure proc. To extract Πproc from proc, we apply symbolic
execution to proc. After marking each input variable (i.e., each parameter in VarsC , VarsI ,
6 and
VarsS) symbolic before the user-defined entry point, we utilize KLEE [12] or S2E [20] to explore
all feasible execution paths through proc that reach a return. On each path through proc, the
symbolic execution engine accumulates a set of constraints among symbolic variables implied by
6To model the random input generated from random number generator rand() in symbolic execution, we created a
symbolic variable per rand() function call as its returned value.
43
the branches taken and assignments computed along that path. These constraints coupled with the
assignments for VarsO defined by our API make observable, as accumulated through the return
instruction, form the postcondition for the path, and then Πproc is simply the disjunction of the
path conditions generated for each execution path.
Symbolic execution can suffer from state explosion, and so we leveraged an optimization in our
work to manage this explosion. Specifically, we implemented a searcher to perform state merging [61]
frequently, wherein the constraints accumulated along two or more execution prefixes ending at the
same instruction are disjoined and then simplified to the extent possible (using an SMT solver);
execution is then continued from their last instruction, accumulating more constraints into their
now-combined constraints. In doing so, these two execution prefixes need only be extended once,
versus each being extended separately if no merging occurred.
This optimization dramatically reduced the number of symbolic states managed in one of our
case studies in Sec. 4.4.3, improving the speed of extracting Πproc by more than 600×. For this
case study, we forced state merging to occur whenever a symbolic state was forked at a symbolic
branch. To reduce the complexity of the merged path constraint, however, we avoided merging two
path constraints when their expressions for the outputs in O differed or when two path constraints
(in conjunctive normal form) had less than half of their conjuncts in common.
To correctly measure the leakage, we assume the postcondition Πproc(C, I,S,O) is complete
and sound. Completeness means that if 〈C, I,S,O〉 is feasible for proc, then 〈C, I,S,O〉 satisfies
Πproc(C, I,S,O). Soundness requires that if 〈C, I,S,O〉 is infeasible for proc, then 〈C, I,S,O〉 does not
satisfy Πproc(C, I,S,O). A well-known limitation of symbolic execution is how to manage unbounded
loops, since these can prevent symbolic execution from terminating. In the case studies of Sec. 4.4
we bounded all inputs, which was enough in these case studies to ensure that symbolic execution
terminated. Provided that we bound the input parameters sufficiently loosely to encompass all
values they can take on in practice, this bounding does not impact the assessment provided by our
measures in practice.
Postcondition generation costs are summarized in Fig. 4.1. These computations were performed
on a DELL PowerEdge R710 server equipped with two 2.67GHz Intel Xeon 5550 processors and
128GB memory. Each processor includes 4 physical cores and had hyperthreading enabled. As indi-
cated in Fig. 4.1, we experimented with both KLEE and S2E to generate postconditions, depending
44










4.4.1 Auto-complete 2d 12h
4.4.2 Gzip 3d 21h 8h
4.4.2 Smaz 2d 18h 6h
4.4.3 v3.18 7d 4d 17m
4.4.3 v3.18-patched 18m
4.4.3 v3.18-rmCounter 17m
on the procedure. In the column headings, a ‘×1’ or ‘×16’ indicates the number of processes across
which the computation was divided. To enable multi-process support in KLEE (i.e., ‘×16’), we
made a small modification in KLEE’s execution engine, to cause it to explore only execution paths
starting from a predefined branching prefix. The designation ‘merging’ indicates the use of the
KLEE optimization summarized above; as indicated in Fig. 4.1, this optimization was remarkably
effective on the Linux TCP implementations discussed in Sec. 4.4.3. S2E was configured to utilize
its concolic execution capabilities.
4.2.2 Hash-based model counting for Jn








∣ for randomly selected, disjoint

























∣ for specified sets S′′ (i.e., S′′ = S, S′′ = S′, or S′′ = S∪S′).
In this section, we provide two optimizations for producing such estimates.
4.2.2(a) Estimating |YS | Our first optimization is an adaptation of the approximate model
counting technique due to Chakraborty et al. [15] (see Sec. 2.4.1).
We estimate |YS | through a similar algorithm used in projected model counting, i.e., by itera-
tively selecting Hb and p ∈ {0, 1}b at random, but apply the hash function only to the C and O










That is, ZS,p ⊆ YS contains the elements of YS whose hash is p. Intuitively, this yields an
estimate














∣ > α (4.14)
where p ∈ {0, 1}b, p̂ ∈ {0, 1}b−1, and α is derived from ǫ [15]. Each such triple individually provides
an estimate that is within error ǫ with confidence at least 0.78 [15, Lemma 1], and the median of
the estimates for all such triples is within error ǫ with confidence that can be increased arbitrarily












∣ is an exact
count of |YS | since ZS,p = YS .
4.2.2(b) Sampling S, S′ of Expected Size n A second expense of calculating YS and YS′
explicitly is in enumerating S and S′ themselves, especially if n is large. We can leverage hashing
similarly to the method above to avoid enumerating S and S′ directly for n = |S| /2b for some
b ≥ 0.
Specifically, to estimate Jn for n = |S| /2
b, we select Hb and p ∈ {0, 1}b−1 at random and, for























∣ ∃S : Πproc(C, I,S,O) ∧H
b−1(S) = p
}



















∣ ∃I : 〈C,O, I〉 ∈ Xp
}
in place of YS , YS′ , and YS∪S′ , respectively, to perform the calculations (4.11)–(4.12). And, of course,










for a different, random hash function Ĥ b̂ and random prefix p̂ ∈ {0, 1}b̂. We then use the algorithm





Two more points about this algorithm warrant emphasis:
• Because our algorithm explicitly enumerates the contents of each Z0p,p̂ and Z
1
p,p̂, when leakage
is detected (i.e., Jn > 0 for some n) these sets can be used to identify 〈C,O〉 pairs that are in






p . These examples can guide developers in understanding the reason for the
leakage and in mitigating the problem.
• Because the number of secrets with a random length-b hash prefix p is only of expected size
n = |S| /2b, for the rest of the chapter we use a definition of Jn as in (4.2) but weakened so that
|S| and |S′| equal n in expectation.
4.2.3 Hash-based model counting for Ĵn
The calculations of the previous section require some modifications when we are instead com-
puting Ĵn for n = |S| /2
b. Similar to the previous section, we can use Xp for p ∈ {0, 1}
b−1 in place






∣ for a random S and S′, we need a different























∃S,S′, I′ : Πproc(C, I,S,O) ∧Πproc(C, I
′,S′,O) ∧







since 〈C,O, I〉 ∈ X̂p iff 〈C,O, I〉 ∈ X
0




p . This method does come at consid-
erably greater computational cost, however, due to the duplication of the constraints Πproc in the
specification of this set. We will demonstrate this in our case studies in Sec. 4.4.
4.2.4 Parameter settings for computing Jn and Ĵn
In the hash-based model counting described above, we use the 3-wise independent hash functions
suggested by Chakraborty et al. [15], and due to the large number of XOR clauses in the resulting
hash constraints, we use CryptoMiniSAT 5.0 [89] to enumerate the elements of each Zp,p̂. To
reduce the complexity of the hash constraints, we concretize their constant bits to minimize the
independent support [51] before generating XOR clauses. Multiple estimates of the form in (4.13),
for various values of b (in (4.13), or respectively b̂ in (4.15)), as prescribed by Chakraborty et al.,
are used to estimate |Yp|. We parameterized this algorithm with error ǫ = 0.45 and confidence
either δ = 0.99 in Sec. 4.3 or δ = 0.92 in Sec. 4.4,7 for which 50 or 5 〈b, p, p̂〉 triples satisfying (4.14)
sufficed, respectively.
We estimate Jn as the sample mean of J(S, S
′) for sampled pairs S, S′ of expected size n (i.e.,
defined by a p ∈ {0, 1}b−1 for n = |S| /2b). For each n we computed Jn using a number of sampled
pairs S, S′ equal to the larger of 100 and the minimum needed so that the standard error was
within 5% of the sample mean.
In addition, since Jn is only an estimate and so is subject to error and since that error is
influential in the calculation of ηmax or ηmin especially when n is small, we round any Jn ≤ 0.025
down to zero when calculating the measures. Ĵn is computed similarly.
4.2.5 Logical postconditions for multiple procedure executions
In some scenarios it is insightful to observe the behavior of Jn for a procedure proc when it is
executed multiple times. That is, consider a scenario in which proc is executed r times, possibly
with relationships among the outputs of one execution and the inputs of another, or simply among
7The error bound of Chakroborty et al. is conservative; e.g., the results for 95 benchmarks showed less than 5% error
in practice even when using ǫ = 0.75 [15].
48
proc (C, I, S)
































(c)ηmin and ηmax for different M
Figure 4.4: A procedure that leaks about more secrets as M is decreased (see Sec. 4.3.1)
the inputs to different executions. Suppose these executions are denoted
O1 ← proc(C1, I1,S1)
O2 ← proc(C2, I2,S2)
· · ·
Or ← proc(Cr, Ir,Sr)
and that the postcondition of the j-th invocation in isolation is denoted Πjproc (i.e., Π
j
proc is simply
Πproc over the variables represented in Cj , Ij , Sj, and Oj). Then the relationships among inputs and
outputs can be described using additional, manually constructed constraints Γ1...rproc . For example, if
the secret input to each execution of proc is the same, then Γ1...rproc would include the statement that
‘secret’ has the same value in each execution (i.e., S1(‘secret’) = S2(‘secret’) = . . . = Sr(‘secret’)).









can reveal leakage that increases as the procedure is executed multiple times. We will see an
example in Sec. 4.4.
4.3 Microbenchmark Evaluation
In this section we evaluate our methodology on artificially small examples to illustrate its
features.
49































“nan” denotes “not a number,” i.e.,
ηmin = 0 or ηmax = 0
(c)ηmin and ηmax for different M
Figure 4.5: A procedure that leaks more about secret values as M is increased (see Sec. 4.3.1)
4.3.1 Leaking more about secret values vs. leaking about more secret values
In Sec. 4.1.1, we showed through an idealized example how a small n is more useful for evaluating
the number of secrets about which information leaks, whereas a large n is more useful for evaluating
the amount of information leaked about these secrets. Now we will use two simple procedures with
a controllable constant M to quantitatively demonstrate the necessity of varying n and the correct
usage of ηmin and ηmax.
The first procedure, shown in Fig. 4.4(a), returns the secret value if it is divisible by a constant
M and returns zero otherwise, where both S(‘secret’) and M are 32-bit integers. This procedure
leaks the same amount of information (the whole secret) about a larger number of secret values
if M is decreased. The behavior of Jn shown in Fig. 4.4(b) is consistent with this observation.
Specifically, different values of M induce curves for Jn that differ primarily in the minimum value
of n where Jn is large. This behavior is also seen in the value of η
min in Fig. 4.4(c), where ηmin
ranges from ηmin ≈ 0 at M = 231 to ηmin = 1 at M = 1.
Contrast this case with the procedure shown in Fig. 4.5(a), which returns the residue class of
the secret value modulo a constant value M . As such, as M is increased, more information about
each secret is leaked. This is demonstrated in Fig. 4.5(b), where the curves for different values of
M differ in primarily in the maximum value n at which Jn is large. Similarly, η
max ranges from
ηmax = 0 at M = 1 to ηmax ≈ 2−0.8 ≈ 0.57 at M = 231.
An example that blends these the previous two examples is show in in Fig. 4.6(a); here the
procedure returns 1 if S(‘secret’) mod M = C(‘test’) and 0 otherwise, where M is a 32-bit constant.
As such, this procedure leaks a lot about a few secret values when M is large, and a little about
many secret values when M is small. As shown in the r = 1 columns of Fig. 4.6(c), ηmin and ηmax
50
proc (C, I, S)



























r = 1 r = 2 r = 4 r = 6 r = 1 r = 2 r = 4 r = 6
2 −1.2 −1.1 −1.2 −1.1−31.4−31.3−31.2−31.3
4 −1.7 −0.9 −0.6 −0.4−31.0−30.2−29.4−29.0
8 −2.8 −1.7 −0.9 −0.6−30.6−29.3−28.8−28.3
64 −7.1 −5.4 −3.9 −2.9−27.1−25.9−25.2−25.1
210 −11.1 −9.5 −8.2 −7.5−22.8−22.0−21.5−21.1
229 −29.9−28.9−27.1−26.5 −3.4 −2.7 −2.1 −1.9
231 −31.0−30.2−28.8−28.2 −1.2 −0.7 −0.4 −0.2
(c) ηmin and ηmax for different M
Figure 4.6: Leakage of procedure that checks a guess of secret’s residue class modulo M (see
Sec. 4.3.1–4.3.2)
monotonically decreases and increase, respectively, as M grows.
4.3.2 Leaking more over multiple rounds
A second way to view the example in Fig. 4.6 is to consider r procedure executions using
the same S(‘secret’) (i.e., S1(‘secret’) = S2(‘secret’) = . . . = Sr(‘secret’)). Our intuition suggests
that after r = M − 1 executions of the procedure, a smart attacker will have learned everything
about S(‘secret’) that it can from proc; e.g., by setting Cj(‘test’) = j, the attacker either will have
observed some Oj(‘result’) = 1, in which case it knows S(‘secret’) mod M = j, or else it knows
S(‘secret’) mod M = 0. Consistent with that intuition, in Fig. 4.6(c), both ηmin and ηmax remain
steady for M = 2 as r increases, since no new information is available to the attacker after r = 1.
Similarly, for M = 4, ηmin and ηmax both increase precipitously (by ≥ 74%) from r = 1 to r = 2
and then begin to flatten out (albeit imperfectly—both are estimated values, after all), which is
consistent with this intuition that the attacker should learn no new information past r = 3. For
M > 4, each additional procedure execution provides additional information to the attacker about
all secrets and much more about some (namely those for which it learns the residue class mod M).
51
proc (C,I,S)
































(c) η̂min and η̂max for different M
Figure 4.7: An example illustrating leakage dependent on randomness (see Sec. 4.3.3)
Correspondingly, both ηmin and ηmax increase monotonically along each of these rows.
4.3.3 Leaking the secret conditioned on randomness
We now illustrate the ability of our technique to measure leakage from a different randomized
procedure from that discussed in Fig. 4.2. The procedure, shown in Fig. 4.7(a), returns the secret
if a random value is divisible by a constant M and returns that random value otherwise. Clearly, a
larger M implies that fewer secret values leak, but those that leak do so completely. This behavior
is illustrated by the Ĵn measure shown in Fig. 4.7(b); the leakage is consistently higher for lower
values of M . Similarly, while η̂max remains high for all values of M (never dropping below 14), η̂
min
ranges from η̂min = 1 when all secrets are leaked (M = 1) to η̂min ≈ 0 when few secrets are leaked
(M = 231).
4.4 Case Studies
In this section, we illustrate our measurement by applying it to real-world codebases susceptible
to the inference of search queries via packet-size observations, inference of secret values due to
compression results, and inference of TCP sequence numbers. We claim no novelty in identifying
these attacks; all are known and explored in other papers, though not in the particular codebases
(or codebase versions) that we examine here and typically only through application-specific analysis.
Our contribution lies in showing the applications of our methodology to measuring interference in
an application-agnostic way and the impact of alternatives for mitigating that interference.
4.4.1 Traffic analysis on web applications
Packet sizes are a known side channel for reverse engineering search queries and other web
content returned from a server, and defenses against this side channel have been studied using
various methods of QIF (e.g., [52, 96, 18]). Specifically, a network attacker can often distinguish
52
Keyword Trigrams
class c cl cla las ass ss s
code c co cod ode de e
div d di div iv v
the t th the he e
and a an and nd d
title t ti tit itl tle le e



























“nan” denotes “not a number,” i.e., η̂min = 0
or η̂max = 0
(c) η̂min and η̂max for different mitigations
Figure 4.8: Analysis of auto-complete feature of Sphinx and mitigation strategies (see Sec. 4.4.1)
between two queries to a web search engine because the response traffic length is dependent on the
query. Even packet padding may not hide all secret information [34].
In this section, we use our methodology to analyze the auto-complete feature of search engines
to demonstrate our ability to detect the leakage of the user’s query from the network packet sizes.
Furthermore, we repeat our analysis after applying mitigations suggested in previous work [34].
This allows us to compare the effectiveness of these mitigations to the original implementation.
We evaluated a C++ web server called Sphinx (http://sphinxsearch.com/), which provides
PHP APIs for a client to send a query string to the server. The auto-complete feature then
returns a list of keywords that best match the query string. To generate the postcondition that
characterizes the auto-complete feature, we marked the query string as the secret (i.e., S(‘secret’)
is the query string) and the final application response length as the observable (i.e., VarsO =
{‘response length’}), by injecting only two lines into the server’s code. In this application, there
was no attacker-controlled input and no other input (i.e., VarsC = VarsI = ∅).
Since the auto-complete results depend on the contents of the server database, we simply instan-
tiated the database with an example containing six keywords and 35 query trigrams (see Fig. 4.8(a)).
53
When provided an input query string of at least three characters, Sphinx returns (content contain-
ing) the two keywords with the highest “score” based on matching trigrams in the query string
to each keyword’s associated trigrams. We also limited queries to three characters drawn from
{‘a’, . . . , ‘z’} ({97, . . . , 122} in ASCII), yielding 263 ≈ 214 possible queries. Note that instantiating
the server with a specific database and limiting the query characters and length as described cannot
induce our analysis to provide false positives, though it can contribute false negatives.
We experimented with two types of mitigation strategies. Random padding is motivated by
protocols like SSH that obfuscate traffic lengths by adding a random amount of padding up to
some maximum limit to the application response payload. We experimented with padding lengths
of up to 2 bytes (‘rand.2’), 16 bytes (‘rand.16’), 64 bytes (‘rand.64’), and 128 bytes (‘rand.128’).
Padding to a fixed length is a second strategy, which increases the length of the application response
payload to the nearest multiple of a fixed length. We experimented with padding to a multiple of 64
bytes (‘fixed.64’) or a multiple of 256 bytes (‘fixed.256’). We “implemented” both of these padding
strategies by modifying the postcondition ΠSphinx to reflect them (vs. modifying the Sphinx code
directly).
Fig. 4.8(b) shows Ĵn for the random padding strategies and Jn (which is equivalent to Ĵn
since VarsI = ∅) for the original, ‘fixed.64’, and ‘fixed.256’ strategies. Here, ‘nopadding’ is the
result for original auto-complete in Sphinx. In addition, Fig. 4.8(c) shows the measure η̂min and
η̂max for each strategy. Only ‘fixed.256’ reaches zero leakage, indicated by ‘nan’ (‘not a number’),
since any result from Sphinx populated with the database in Fig. 4.8(a) fit within 256 bytes and
so resulted in a padded payload of that length. Comparing different padding mechanisms, our
measures η̂min and η̂max show results consistent with the intuitive order of the different mitigation
strategies in terms of their effectiveness in preventing leakage. Our results suggest that ‘nopadding’
leaks the most, followed by ‘rand.2.’ The configuration ‘rand.16’ was only very slightly worse than
‘rand.64’, and ‘fix.64’, which provided similar protection for this setup, and ‘rand.128’ provided
better protection than all others except ‘fixed.256.’ These results demonstrate the power of our
methodology for enabling comparisons of the benefits of different amounts of padding for this
database. For example, our analysis shows that for this database, ‘rand.64’ provides little security
























r = 1 r = 2 r = 3 r = 1 r = 2 r = 3
Gzip −2.04 −1.22 −0.85 −1.00 −0.58 −0.43
Smaz −1.58 −1.55 −1.54 −3.73 −4.02 −3.95
(b) η̂max and η̂max for different r
Figure 4.9: Leakage from Gzip and Smaz (see Sec. 4.4.2)
4.4.2 Leakage in compression algorithms
Our methodology is powerful in accounting for attacker-controlled inputs, and in this section we
demonstrate the benefits of this capability by applying it to detect CRIME attacks [53, 4]. A CRIME
vulnerability arises when a web client applies “unsafe” compression prior to transmitting a request
over TLS. HTTP requests can carry information (e.g., the URL parameters) that an attacker
can induce; e.g., if the client visits an attacker-controlled website, then the attacker can induce
requests from the client to another, target website with URL parameters that the attacker sets. By
observing the lengths of compressed requests to the target website, the attacker can deduce whether
the attacker-controlled input shares a substring with a secret contained in the request (e.g., the
client’s cookie for the target website) that the attacker is unable to observe directly. To be concrete,
if the attacker-induced request to the target website is http://target.com?username=name then
the request will compress better if name is a prefix of the client’s cookie for target.com.
CRIME attacks utilize the property of an adaptive compression algorithm that the encoding
dictionary is dependent on both the secret and attacker-controlled variables. As suggested by
Alawatugoda et al. [4], a possible mitigation is to separate the compression for the secret and the
other parts of the plaintext or to use a fixed-dictionary compression algorithm such as Smaz [85].
The latter mitigation, though an improvement, removes the influence of the attacker-controlled
55
input only on the compression dictionary. Consider a two-byte plaintext ab whose first character is
secret. If a is ‘a’, then this two-byte word will be compressed if b is ‘t’ and will be left unchanged
if b is ‘y’, assuming ‘at’ is in the dictionary but ‘ay’ is not. Thus, the leakage should not be zero
even if a fixed-dictionary algorithm is used.
To analyze this scenario in our framework, we modeled the input for Gzip and Smaz to be of
the form
‘http://target.com/? secret=’ + S(‘secret’) + I(‘suffix’) + ‘,username=secret=’ + C(‘input’)
where ‘+’ denotes concatenation. Here, S(‘secret’) and C(‘input’) were each one byte, I(‘suffix’) was
two bytes, and the attacker-observable variable was the length of the compressed string. Each byte
was allowed to range over ‘a’, . . ., ‘z’ and ‘0’,. . .,‘9’. The S(‘secret’) byte after the first ‘secret=’
plays an analogous role to the client cookie in a CRIME attack, i.e., as the secret to be guessed by
the attacker, and the ‘secret=’ immediately following ‘username=’ serves as a prefix to match the
first instance of ‘secret=.’
We applied our tool to analyze the leakage susceptibility of Gzip-1.2.4 and Smaz in this config-
uration, executed up to three times (r ∈ {1, 2, 3}) with the same secret. Our results are shown in
Fig. 4.9. Our results show that for one execution (r = 1), Smaz is no better than Gzip. That is,
ηmax and ηmax in Fig. 4.9(b) suggests that Smaz leaks less information about some secrets but some
information about more secret values versus Gzip; as mentioned above, Smaz can leak information
about a secret value if it composes a word in its dictionary, as well. However, the strength of Smaz
is revealed as r grows, since its leakage remains unchanged. In contrast, the leakage of Gzip grows
with r, essentially matching that of Smaz at r = 2 and surpassing it at r = 3 (in terms of ηmin).
This occurs because in each execution of Gzip, the attacker has the latitude to select a different
value for C(‘input’) and then observe that selection’s impact on the length of the compressed string
(which in general will change). In contrast, the leakage of Smaz is independent of the adversary’s
choice for C(‘input’), and so additional executions do not leak any additional information.
As discussed at the end of Sec. 4.2.2, a side effect of our methodology is identifying some
example 〈C,O〉 pairs that lie in YS \ YS′ or YS′ \ YS for samples S, S
′ of secrets, which can help
in diagnosing a leak. For example in Table 4.2, for Gzip in the r = 1 case, our tool identified
the 〈C,O〉 pair with C(‘input’) = ‘c’ and O(‘length’) = 66 as being in YS \ YS′ for a sampled S,
56
S′ where S ∋ ‘c’ = S(‘secret’) and I(‘suffix’) = ‘oo’.8 As such, the developer now knows that
this 〈C,O〉 pair is consistent with no secret in S′. Similarly, for Smaz our tool identified the pair
〈C,O〉 with C(‘input’) = ‘r’ and O(‘length’) = 36 as being in YS \ YS′ for a sampled S, S
′ where
S ∋ ‘f’ = S(‘secret’) and I(‘suffix’) = ‘or’.
Table 4.2: Examples from YS \ YS′ for samples S, S
′ (r = 1) in CRIME attacks
C(‘input’) O(‘length’) S (‘secret’) I(‘suffix’)
Gzip ‘c’ 66 ‘c’ ‘oo’
Smaz ‘r’ 36 ‘f’ ‘or’
4.4.3 Linux TCP sequence number leakage
Known side channels in some TCP implementations leak TCP sequence and acknowledgment
numbers [72, 79]. In some cases, these side channels can be used by off-path attackers to terminate
or inject malicious payload into connections [13, 79]. The origin of these attacks is shared network
counters (e.g., linux mib and tcp mib) that are used to record connection statistics across different
connections in the same network namespace.
These counters have been implicated in numerous side channels since version 2.0 of the Linux
kernel [64]. For example, the code snippet (without the patch in Lines 6–10) in Fig. 4.10 leaks
the secret tp->rcv nxt in Linux-3.18 TCP. Here, the attacker controls the skb input and so
the value TCP SKB CB(skb)->seq that is compared to tp->rcv nxt on Line 5. Based on this
comparison, the NET INC STATS BH procedure increments an attacker-observable counter indicated
by LINUX MIB DELAYEDACKLOST (Line 11). If the attacker can repeatedly cause the procedure in
Fig. 4.10 to be invoked with inputs skb of its choice, it can use binary search to infer tp->rcv nxt
within 32 executions [79].
The most straightforward mitigation for this leakage is to disable the public counters. This
will stop the leakage, but will disable some mechanisms such as audit logging. Another potential
mitigation is to increase the difficulty of increasing the public counter, by adding additional checking
related to more secret variables. For example, before increasing the LINUX MIB DELAYEDACKLOST
8The output length of 66 exceeds the length of the input string because Gzip adds a header to the output. Smaz
attaches no such header.
57
1 void tcp_send_dupack(struct sock *sk ,
2 const struct sk_buff *skb) {
3 struct tcp_sock *tp = tcp_sk (sk);
4 if (TCP_SKB_CB (skb)->end_seq != TCP_SKB_CB (skb )->seq &&
5 before (TCP_SKB_CB (skb)->seq , tp ->rcv_nxt )) {
6 + if (before (TCP_SKB_CB (skb)->ack_seq , tp ->snd_una - tp ->max_window )
7 + || after (TCP_SKB_CB (skb)->ack_seq , tp ->snd_nxt )) {
8 + tcp_send_ack(sk);
9 + return ;
10 + }





Figure 4.10: A code snippet vulnerable to leaking the TCP sequence number in linux 3.18; lines
marked ‘+’ indicate a hypothetical patch with which we experimented (see Sec. 4.4.3)
counter, the procedure could also check for correct acknowledgment numbers, as shown in the patch
in Lines 6–10. As far as we know, our study is the first to compare these potential mitigations for
TCP sequence and acknowledgment number leakage.
To analyze the information leakage in this example, we compiled a user-mode Linux kernel [30]
as a library. Our target procedure for analysis was tcp rcv established, which is of the form
void tcp_rcv_established(struct sock *sk ,
struct sk_buff *skb ,
const struct tcphdr *th ,
unsigned int len) {
struct tcp_sock* tp = (struct tcp_sock*) sk;
...
}
The inputs for tcp rcv established have many constraints among them when passed in, for
instance
TCP SKB CB(skb)->seq< TCP SKB CB(skb)->end seq
tp->rcv wnd ≤ MAX TCP WINDOW
tp->snd wnd ≤ MAX TCP WINDOW
To generate constraints for the inputs to tcp rcv established, we applied symbolic execution to
the procedures fill packet and tcp init sock. Symbolic buffers to represent these inputs and
their associated constraints were then assembled within tcp rcv established. We also stubbed
58
out several procedure calls9 within tcp rcv established, causing each to simply return a symbolic
buffer so as to avoid symbolically executing it, since doing so introduced problems for KLEE (e.g.,
dereferencing symbolic pointers).
After generating the postcondition for the procedure tcp rcv established, we defined the
attacker-controlled inputs to be
VarsC = {TCP SKB CB(skb)->seq,
TCP SKB CB(skb)->end seq,
TCP SKB CB(skb)->ack seq,
tcp flag word(th)}
(each four bytes) and the attacker-observable variables to be VarsO = {linux mib, tcp mib}. All
fields of constrained input structures (e.g., tp->snd una and tp->max window) not covered by VarsC
and VarsO were added to VarsI , with the secret variables
10 being tp->rcv nxt and tp->snd nxt
(each four bytes). We conducted single-execution (r = 1, denoted ‘v3.18-1run’), two-execution
(r = 2, denoted ‘v3.18-2run’) and three-execution (r = 3, denoted ‘v3.18-3run’) leakage analysis.
In the multi-execution analysis, we assumed *sk to be the same in multiple executions (I1(‘*sk’)
= I2(‘*sk’) = . . . = Ir(‘*sk’)) since its fields used in tcp rcv established would be unchanged or,
if changed, would be changed predictably.
The results from this analysis are shown in Fig. 4.11. The inset graph in Fig. 4.11(a) is a
magnification of the portion of the curve in the interval [0, 8] on the horizontal axis. Specifically,
the highest leakage resulted from ‘v3.18-3run’, followed by ‘v3.18-2run’ and ‘v3.18-1run’, as in-
dicated by the Ĵn curves in Fig. 4.11(a) and the η̂
min and η̂max measures in Fig. 4.11(b). This
shows the potential for the attacker to extract more information about the secrets tp->rcv nxt
and tp->snd nxt using multiple executions. This is consistent with the observation that a smart
attacker could utilize this side channel to infer one bit per execution [79].
To alleviate this leak, we applied a hypothetical patch shown in Fig. 4.10 that checks another
9Specifically, we stubbed out get seconds, current thread info, tcp options write, tcp sendmsg, prandom bytes,
current thread info, tcp parse options, and tcp checksum complete user.






























(b) η̂min and η̂max for versions of
tcp rcv established
Figure 4.11: TCP sequence-number leakage (see Sec. 4.4.3)
secret value tp->snd nxt before incrementing the counter for LINUX MIB DELAYEDACKLOST. Our
analysis results (for r = 1 execution, denoted ‘v3.18-patched’) in Fig. 4.11 shows that the patch
alleviated the leakage somewhat. We also tried just deleting Line 5-11 from the original (unpatched)
code in Fig. 4.10. As shown in Fig. 4.11, this version (denoted ‘v3.18-rmCounter’) evidently has
lower leakage than ‘v3.18-patched’. In considering these mitigations, we stress that our patch
addressed only the leakage arising from Line 5, and not all sources that leak information about
tp->rcv nxt or tp->snd nxt (which are numerous, see Chen et al. [17]). Our results suggest,
however, that our methodology could guide developers in mitigating leaks in their code.
4.4.4 Performance
Performance of our tool involves two major components, namely the time to compute the
postcondition Πproc via symbolic execution, and the time to calculate Jn or Ĵn for different n
starting from Πproc . Postcondition generation is not a topic in which we innovate, and so we defer
discussion of its costs in our case studies to Sec. 4.2.1. Here we focus on the costs of calculating Jn
or Ĵn for different n starting from Πproc .
Starting from Πproc , the computation of Jn or Ĵn can be parallelized almost arbitrarily. Not
only can Jn or Ĵn for each n be computed independently, but even for a single value of n, the
estimation of J(S, S′) or Ĵ(S, S′) can be computed for each pair of sampled sets S, S′ and each
estimation iteration independently. In Fig. 4.12, we report the average estimation time per sample
pair, which indicates that all case studies could finish one estimation in (4.13) for one sample pair
within about one minute. As such, the speed of calculating final pair ηmin and ηmax is limited
60
Sec. Procedure J(S, S′) Ĵ(S, S′) Jn Ĵn
4.4.1 Auto-complete (nopadding) 34ms 56ms 5m 7m
4.4.1 Auto-complete (fix.64) 48ms 65ms 6m 8m
4.4.1 Auto-complete (fix.256) 43ms 57ms 6m 7m
4.4.1 Auto-complete (rand.64) 1.2s 15m
4.4.2 Gzip 26s 4h
4.4.2 Smaz 40s 10h
4.4.3 v3.18-1run 73s 20h
4.4.3 v3.18-patched 67s 20h
4.4.3 v3.18-rmCounter 50s 19h
Figure 4.12: Average time per estimate (J(S, S′) or Ĵ(S, S′)) and most expensive overall time (Jn
or Ĵn) for case studies
primarily by the number of processors available for the computation.
In our experiments, performed on a DELL PowerEdge R815 server with 2.3GHz AMD Opteron
6376 processors and 128GB memory, we computed Jn or Ĵn per value of n on its own core. As
reported in the last two columns of Fig. 4.12, the time to do so for the most expensive value of n
ranged from roughly 15m for the auto-complete procedure of Sec. 4.4.1 to about 20h for the Linux
TCP implementations of Sec. 4.4.3. For several of our case studies (see Fig. 4.12), we experimented
with calculating Ĵn even when Jn was sufficient, and found its estimation to cost ≤ 2× that of
estimating Jn, due to the duplication of Πproc in X̂p.
To place the above numbers in some context, the ≈ 20h (for the worst n, without parallelization)
dedicated to computing a value of Jn in the Linux TCP case study of Sec. 4.4.3 involved a procedure
proc of which 165 bytes of its inputs were somehow used in the procedure. A naive alternative to
our design in which all possible inputs are enumerated and run through the procedure to compute
its outputs (and interference measured from these input-output pairs, perhaps as we do) would
therefore involve enumerating 21320 possible inputs, which is obviously impractical.
In this light, our technique that performs interference analysis for real codebases in the time-
frame of minutes-to-hours (and far faster with parallelization) is a dramatic improvement. Moreover,
these results are likely to only improve with advances in symbolic execution and model counting.
Even our experimentation with various optimizations for postcondition generation and model count-
ing was not exhaustive. That said, the results above suggest that the costs of our approach are
likely to remain sufficiently high for real codebases to preclude its use for interactive analysis by
human programmers. Rather, we expect that our analysis could be run as a diagnostic technique
61
overnight, for example.
4.5 Discussion and Limitations
The static method to quantify noninterference builds from two tasks that are recognized, difficult
challenges in computer science. The first is the construction of a logical postcondition Πproc for
a procedure proc, for which we leverage symbolic execution. As such, our technique inherits the
limitations of existing symbolic execution tools and those incumbent on generating postconditions,
more generally. For example, symbolic execution is difficult to scale to some procedures, and
challenges involving symbolic pointers and unbounded loops can require workarounds, as they did
in our TCP case study (Sec. 4.4.3). The second challenge problem underpinning our methodology is
model counting, which is #P-complete. We are optimistic that future improvements in these areas
will be amenable to adoption within our methodology. While the resulting tool is not yet quick
enough to support interactive use, it is positioned to benefit from advances in symbolic execution
and approximate model counting, both active areas of research.
Our approach is powerful in that it can be applied to scenarios in which the distributions of
inputs—whether they be attacker controlled or other—are unknown, and this is often the case in
practice. In some cases, the input distributions are unknowable, especially for VarsC . In others,
they may be knowable but require considerable empirical data to estimate (e.g., the distributions
of user-input search terms, in a context like that of Sec. 4.4.1). That said, because it is insensitive
to these distributions, it does not offer an immediate way to accommodate these distributions if
they are known. Still, our methodology allows these inputs to be accounted for in a principled way,
in contrast to others that either disallow them or assign them heuristically.
Our measure does not take into account computational limits on the attacker, as is typically
assumed by cryptographic algorithms. As such, our measure is best viewed as a measure of
information-theoretic security, where the attacker has unlimited computation power. For example,
assuming it were possible to generate our measure for a digital signature algorithm, our measure
would typically indicate that a signature and public key divulges the private key, even if the private
key is intractable to compute. That said, due to hardness in solving a cryptographic problem with




In this chapter we have suggested a static method for assessing interference and attempts to
mitigate it. Informally, noninterference is achieved when the output produced by a procedure in
response to an adversary’s input is unaffected by secret values that the adversary is not authorized
to observe. Following this intuition, we have developed an automatic tool to estimate the number
of pairs of attacker-controlled inputs and attacker-observable outputs that are possible, conditioned
on the secret being limited to a particular sample. The discovery of such pairs that are possible for
one sample but not another reveals interference.
We clarified the effectiveness of our strategy both on artificial examples (Sec. 4.3) and on real-
world codebases (Sec. 4.4). Specifically, we evaluated leakage in the Sphinx auto-complete feature
of its search interface due to its response sizes, and the effectiveness of a variety of mitigations
(Sec. 4.4.1); the CRIME vulnerabilities of adaptive compression in Gzip and fixed-dictionary com-
pression in Smaz (Sec. 4.4.2); and leakage of TCP sequence numbers in Linux and the effectiveness
of two mitigations of our own design (Sec. 4.4.3). Within these contexts we also explored leakage
over a single procedure execution and over many, and showed that our framework allowed for a
useful comparison of how procedures leaked data as the number of executions grows.
Central to our methodology’s ability to scale to real codebases is our expression of leakage
assessment within a framework that permits the use of approximate model counting (and specifically
hash-based model counting).
63
CHAPTER 5: DECLASSIFICATION AND INTERPRETABILITY
Based on the quantitative noninterference measure, this chapter permits information to be
declassified to focus on actual unintended leakage and interprets leakage measurements for the
analyst in terms of simple rules that characterize when leakage occurs.
While noninterference measurement for arbitrary computations remains out of reach, in this
chapter we address the shortcomings in our measuring framework within a particularly impor-
tant and complex domain, namely information leaks arising in hardware processors. Leakage of
software secrets due to processor optimization have attracted massive attention in recent years, es-
pecially since the discovery of vulnerabilities arising due to the footprint of speculative executions
in processor caches (Spectre [57], Meltdown [65], and variants). Even though many defenses
(e.g., [91, 101, 93, 90]) have been proposed to restrict cache-based side channels, we are aware of
no measurement methodology to compare designs and evaluate their effectiveness, working directly
from their Verilog specifications. Adapting the framework proposed in Chapter 4 to do so, however,
appears difficult, due to the sheer complexity of modern processor designs.
In this chapter, we present a methodology that does so, using three key methodological advances:
• Since generating a logical postcondition for a processor’s execution of a program en masse is
intractable, we devise a method to build the postcondition one cycle at a time. To build single-
cycle formulas, we abandon symbolic execution, as we found that applying it to hardware designs
induces significant path explosion for even one CPU cycle. Instead, we extract the single-cycle
formulas without solving for feasible paths, and then leverage a number of aggressive optimiza-
tions when stitching single-cycle formulas together to build the postcondition for the processor’s
multi-cycle execution. In doing so, we need to be careful that these optimizations preserve
the number of assignments to relevant variables across all solutions, a socalled projected model
counting problem.
• Since some leakage is inevitable, our methodology enables analysts to declassify certain infor-
mation, thereby focusing the measurement on any other leakage that might be occurring, i.e.,
64
leakage that cannot be inferred from the declassified information. For systems as complex as
modern processors, this ability is essential to permit analysts to decompose and analyze leakage
in a piecemeal fashion.
• The sheer complexity of processor designs means that once leakage is measured, the exact con-
ditions that cause this leakage might not immediately be evident. Our methodology therefore
incorporates a method of interpreting the leakage, i.e., providing simple rules that indicate cir-
cumstances in which leakage will (or will not) occur. Each such rule is additionally accompanied
by a precision and recall, so that analysts can prioritize the rules they address.
Due to the focus of our methodology on support for declassification and interpretability, we call
our tool that realizes it DINoMe (for “Declassification and Interpretability for Noninterference
Measurement”).
To evaluate DINoMe, we apply it to evaluate leakage arising from execution on a RISC-
V BOOM core [14], a state-of-the-art public domain processor design. Our improvements to
generating logical postconditions for execution permit DINoMe to do so for more than 100 cy-
cles of this core. This, in turn, permits us to evaluate leakage from cache-based side channels
(Prime+Probe [73] and Flush+Reload [95]) in a variety of scenarios, including cryptographic
key leakage in sliding-window based modular exponentiation (e.g., [75, 3]), leakage of secrets due
to speculative execution [57, 65], and how this leakage is (incompletely) mitigated by proposed
improvements such as ScatterCache [93] and PhantomCache [90]. In each case, we measure
interference and additionally generate rules to explain why the leakage occurs, and in some cases
refine our view of the leakage using declassification. Our performance evaluation of DINoMe in-
dicates that while it is not fast enough to support interactive use by the analyst, these types of
analyses complete in times ranging from seconds to under 15 minutes (using horizontal scaling),
after an initial phase to assemble the logical postcondition of up to (only) two hours on (only) a
single core.
The rest of this chapter is structured as follows. We introduce our declassification in Sec. 5.1.
We then present our method for interpreting leakage in Sec. 5.2. We address various implementation
challenges in Sec. 5.3, and then evaluate our tool, DINoMe, through several case studies in Sec. 5.4.
We discuss limitations in Sec. 5.5 and the summarize this chapter’s results in Sec. 5.6.
65
5.1 Measuring Interference with Declassification
Some sources of information leakage may be desirable or inevitable; e.g., the results of a password
check will tell whether an input is the correct password, and so will “leak” information about the
correct password. To exclude intended leakage from the analysis, it will be helpful to provide
a method to exempt some identified information leakages specified by the analyst, allowing the
analysis to focus on the leakage that remains. Specifically, our methodology seeks to assess the
degree to which a procedure permits secrets to be distinguished by the attacker using attacker-
observable and declassified information but not by the declassified information alone.
Let ∆ ← δ(C, I,S) denote the allowed information exposure (e.g., for a password checker, ∆ is
whether the input is the legitimate password), and let
Πproc,δ(C, I,S,O,∆) ← Πproc(C, I,S,O) ∧Πδ(C, I,S,∆)
where Πδ(C, I,S,∆) is a logical postcondition for δ that relates ∆ to C, I, and S. Then, we can
define the attacker’s accessible set Y δS of 〈C,O,∆〉 tuples and allowed accessible set D
δ
S consistent




















∃O, I : 〈C,O,∆, I〉 ∈ XδS
}
Since the declassified information is allowed to leak, we are concerned only with cases where
the secret is distinguishable by 〈C,O,∆〉 but not by 〈C,∆〉. Here, we define a set X̃δS,S′ to include








〈C,O,∆, I〉 ∈ XδS ∪X
δ




























O(ovar )← S(svar)[0 : 3]
(a) An artificial procedure
δi-j (C, I,S)
























(c) Measurement with vs. without declassification
Figure 5.1: Declassification example


















S, S′ : |S| = |S′| = n
∧ S ∩ S′ = ∅
Ĵδ(S, S′) (5.5)
To illustrate the use of Ĵδn, consider the simple procedure shown in Fig. 5.1(a). In this procedure,
S(svar ) is an 8-bit value, and proc simply outputs the lowest 4 bits as O(ovar ). The declassification
policy shown in Fig. 5.1(b) allows the i-th to j-th bits of S(svar ) to be released. We evaluate Ĵδn
with differently parameterized declassification policies in Fig. 5.1(c). Specifically, when the lowest
4 bits (i = 0, j = 3) are declassified, then the additional leakage from proc is nothing, which is
demonstrated by the “proc + δ0-3” curve. When the declassification policy declassifies all but the
lowest 4 bits (i = 4, j = 7), then the additional leakage by proc is maximized, as shown by the
“proc + δ4-7” curve. Intuitively, if O(ovar ) and ∆(‘info’) do not overlap (e.g., “proc + δ4-7” and
“proc + δ4-5”), then the Ĵ
δ
n curve should be higher than Ĵn, whereas if O(ovar ) includes all of
∆(‘info’) (e.g., “proc + δ0-3” and “proc + δ0-1”), then Ĵ
δ
n should be lower than Ĵn. A hybrid case
67
occurs when O(ovar ) includes a portion of ∆(‘info’) (e.g., “proc+δ2-5”), where Ĵ
δ
n is lower than Ĵn
when n is small but becomes larger when n is large. This is consistent with the interpretation that
Ĵδn with small n primarily reflects the number of secret values for which interference occurs [100];
e.g., when n = 1, two secret values share bits 0–1 (and so cannot be distinguished by bits 0–3 after
declassifying bits 2–5) in 25% of cases, but share bits 0–3 (and so cannot be distinguished using
them) in only 6.25% of cases. Larger n, in contrast, better reflects the amount of leakage that
occurs [100]. For example, in a random partition of all 28 values into sets S and S′ of equal size
(i.e., n = 27), every value for bits 2–5 is represented in both S and S′ with high probability. In
conjunction with the additional bits 0–1 output in O (yielding six bits of the secret value in total),
however, these bits give the attacker greater distinguishing power than do bits 0–3 alone.
5.2 Interpreting Leakage
Our defined quantitative leakage metric can measure the interference of a secret with the outputs
observable to the attacker. For this to be useful to an analyst, however, it is helpful to provide
guidance as to why this leakage occurs. Specifically, while the conditions under which leakage
occurs are already represented in the postcondition, it may be difficult to understand the formula
without further help. In this section, we provide a method for generating short and understandable
rules to explain the leakage to a user.
5.2.1 Noninterference and interference tuples
Our first step toward providing an intuitive explanation for the leakage that occurs is to train a
binary classifier to classify 4-tuples 〈C, I,S,S′〉 into those that illustrate leakage occurring (i.e., that
permit the attacker to distinguish S(svar) and S′(svar ) from the resulting output O) and those
that do not. When using declassification, the interference tuples should only include those where
the secrets can be distinguished using C,O,∆ but not using just C,∆.
More specifically, we define the interference set IS based on (5.3). That is, when the attacker
chooses C, if an observable value is feasible for 〈I,S〉 for some I but is never possible for 〈I′,S′〉 for
























∃O,∆ : 〈C,O,∆, I〉 ∈ XδS
∧ 〈C,∆〉 ∈ DδS′ ∩D
δ
S′
















where S = {S(svar )} and S′ = {S′(svar )}.
The noninterference set NS should include two types of tuples. For an attacker-chosen C, if there
is a observable value O that is feasible for an 〈I,S〉 pair and an 〈I′,S′〉 pair, tuple 〈C, I,S,S′〉 belongs
to NS as it is an example where no interference occurs. In addition, for an attacker-chosen C, if
there is a declassification value ∆ that is feasible for 〈I,S〉 but not 〈I′,S′〉 for any I′, then 〈C, I,S,S′〉








∃O,∆ : 〈C,O,∆, I〉 ∈ X
δ













∃O,∆ : 〈C,O,∆, I〉 ∈ X
δ






where S = {S(svar )} and S′ = {S′(svar )}.
Since NS and IS are large in practical scenarios, enumerating all tuples is generally unfeasible.
Instead, we generate samples in each set to train a machine learning model, from which explanations
of the leakage will be extracted (as described below). Doing so with modern SAT solvers, however,
typically results in samples that cover NS and IS unevenly, since solvers generally enumerate the
next solution by simply adding a conflict constraint to block out previous solutions; as a result, the
next solution found is typically close to the previous. Another drawback of using this “blocking”
method to sample is that we cannot parallelize the sampling.
For this reason, we sample from NS and IS using hash-based sampling (cf., Chapter 4). Specif-
ically, we sample a limited number of solutions by adding a random universal hashing constraint
to the formula given to the solver. Due to the hash function’s universality, we can run multiple
samplers in parallel to generate a large number of uniformly distributed solutions. In most cases,
the sizes of the sampled sets N̂S and ÎS differ either due to differences in the sizes of NS and IS
or due to the solving difficulty of one set compared to the other. We associate a sample weight to
each element to ensure that the weight of each set is equal in the training process described below.
5.2.2 Interpretation through a rule-based method
Given N̂S and ÎS—i.e., 〈C, I,S,S′〉 tuples labeled according to whether they illustrate noninter-
ference or interference—we could train an interpretable machine-learning model and then extract
rules to explain to the user what gives rise to interference. A natural such model to consider is a
decision tree. In a decision tree, each decision node (i.e., interior node) is a predicate on features of
69
a 〈C, I,S,S′〉 tuple, and its two children correspond to a true or false evaluation of this predicate on
a tuple, respectively. A 〈C, I, S, S′〉 tuple is classified by traversing the tree from its root, following
the branch from each decision node corresponding to the result of evaluating the predicate at that
node on the tuple. Each leaf is labeled with an estimate of the probability that a tuple constrained
by the predicates’ evaluations from the root to that leaf is in IS . We will discuss what features we
include in the process of building decision trees in Sec. 5.2.3, but an example might be individual
variables (e.g., C(cvar )).
A single decision tree can easily grow to be deep and complex, and it can miss some useful
combinations of predicates since each decision predicate is highly influenced by the splits above it
in the tree. To make the decision tree model more powerful in finding useful predicates, we used
a decision-tree ensemble called gradient boosted trees [40]. This process produces m trees denoted
T1, . . . , Tm, with associated weights. If we denote by Tj(〈C, I, S, S
′〉) the real number stored at
the leaf to which 〈C, I, S, S′〉 is assigned by Tj, then the weighted sum of Tj(〈C , I, S, S
′〉) for
j = 1, . . . ,m is an estimate of the probability that 〈C, I,S,S′〉 ∈ IS .
To interpret tree ensembles, rule-based classifiers (e.g., RuleFit [41], Slipper [26], Pre [39])
were introduced to bridge the interpretability of a decision tree with the modeling power of a tree
ensemble. Our toolchain leverages Skope-rules1 to generate logical rules from the tree ensemble.
Specifically, consider any path from the root to a leaf in a tree Tj , and let πj,1, . . . , πj,ℓ denote the
predicates along that path that evaluated to true. So, for example, if the first predicate encountered
in Tj , say “C(cvar ) = 1”, evaluated to false, then πj,1 = “C(cvar ) 6= 1”. Then, Skope-rules
constructs a rule by conjoining πj,1, . . . , πj,ℓ, with the caveat that it limits the number of predicates
included in any rule by heuristically pruning them.
Each such rule has an associated precision and recall, which we evaluate empirically using a
validation set held out from N̂S and ÎS during training. That is, the recall of a rule is the fraction
of validation samples held out from ÎS for which the rule evaluates to true, and the precision of
the rule is the fraction of validation samples (from ÎS or N̂S ) for which the rule evaluates to true
that were held out from ÎS . We further prune rules by iteratively removing conjuncts from a long
1https://skope-rules.readthedocs.io/
70
rule if the precision of the resulting rule is at least 95% of the original. We then rank order rules
according first to precision, and then according to recall.
As we will show, these rules can assist in a quantitative analysis of the leakage. For example,
a procedure may include a backdoor that leaks the secret through a specific attacker-controllable
condition (e.g., leaking the secret only when a 64-bit attacker-controlled variable equals 0). The
worst-case interference could be masked by a large number of weak interferences, as we mentioned
in Sec. 4.1. In that case, the interpretable rule could reveal this specific condition and help the
developer re-evaluate the interference under that condition. The case studies in Sec. 5.4.3–5.4.5
illustrate how we choose the attacker-controlled inputs based on the interpretable leakage rules and
then re-evaluate the interference caused by a specific side-channel vector.
5.2.3 Feature engineering
The utility of the rule generation described in the previous section depends critically on the
features of each 〈C, I, S, S′〉 tuple exposed when training the tree ensemble, from which the
predicates making up the decision nodes of each tree are formed. One factor that makes feature
engineering especially critical here is that the SAT solver used to produce elements of ÎS and N̂S
requires that the conditions defining IS and NS (i.e., conditions (5.6) and (5.7)) be presented to
the SAT solver in terms of binary variables only. As such, each solution generated by the SAT solver
is expressed as an assignment to these binary variables. While for some hardware logic, a binary
representation of the relevant variables is most natural, for other types of logic (e.g., on integers),
it is not.
For this reason, we augment each binary solution returned by the SAT solver (i.e., each 〈C, I, S,
S′〉 tuple) with additional features. First, we reconstruct features in a type-aware way from their
binary representations. For example, if a variable was initially an integer before being reduced to
a collection of binary variables in the formula presented to the SAT solver, we recover the integer
value from the bit-vector solution and include it as a feature on which the tree ensemble can trained.
With such type-aware features, predicates such as, e.g., “S(svar ) < 15” can be learned in a search
for simple predicates testing only a single feature, i.e., unary predicates.
Unary predicates, however, will be unable to naturally capture some relationships resulting in
leakage. For example, if leakage happens only when S(svar) > C(cvar), permitting only unary
predicates will result in a boundary characterized point-by-point, e.g., “S(svar ) ≥ θ∧C(cvar) < θ”
71
An anchor point A tuple in IS A tuple in NS
A local linear classifier
More
anchor points
Figure 5.2: Finding linear combinations of features near anchor points
where θ = 1, 2, . . . We thus expanded our feature set to permit linear combinations of some features
(e.g., “S(svar )−C(cvar )”), chosen by a linear classifier described below. To accommodate branching
in the procedure that results in discontinuities in the boundary between IS and NS , we opted for
a local linear classifier (e.g., [36, 82]).
More specifically, to train the classifier with local constraints, we pick anchor points, around
each of which we train a local classifier that best separates the nearby samples in ÎS and N̂S . (See
Fig. 5.2.) To select anchor points, we first find pairs of 〈C, I, S, S′〉 tuples, one from ÎS and one
from N̂S , that are neighbors in one feature (i.e., after ranking all tuples by this feature, the pair are
adjacent in the ranking) and then take the pair’s midpoint tuple as their per-feature means. We
then select anchors uniformly at random from these midpoints. For each anchor, we train a linear
classifer using the tuples in ÎS and N̂S that are within a threshold Euclidean distance from the
anchor. The linear combination of features used in this linear classifier is then added as another
feature to each 〈C, I, S, S′〉 tuple.
5.3 Implementation
We developed our approach for evaluating and interpreting leakage, described in Sec. 5.1 and
Sec. 5.2, with an eye toward applying it to evaluate and understand leakage from hardware designs.
To do so, we define the procedure proc to be a hardware design, say written in Verilog, in its initial
state but with a predefined program stored in its memory. Our system, DINoMe, enables the user
to annotate the configuration by marking components of the hardware state as attacker-controlled
(i.e., in VarsC), attacker-observable (in VarsO), secret (in VarsS), or otherwise unknown to the
attacker (in VarsI); to simplify discussion below, we will assume there is one of each, denoted cvar ,
ovar , svar , and ivar , respectively. Our system converts this “procedure,” which we continue to
72
denote proc, to a cycle-accurate logical formula Πproc that characterizes hardware execution of the
program and that relates C, O, I, and S. The user can also declare a declassification function
δ that operates on the hardware state of the system (we will give examples below), from which
DINoMe similarly produces a logical formula Πδ that characterizes how the declassified information
∆ relates to inputs C, I, and S in the execution of proc. From Πproc and Πδ , DINoMe generates
Ĵδn for varying n (see (5.5)) and, if requested, sample sets ÎS and N̂S from IS (see (5.6)) and NS
(see (5.7)), respectively. These sets seed the generation of the rules for interpreting leakage, as
discussed in Sec. 5.2.
In the subsections below, we will discuss particular challenges we encountered when building
DINoMe and how we overcame them. We focus on how to extract Πproc(C, I,S,O) in Sec. 5.3.1.
In Sec. 5.3.2, we describe simplification techniques we leverage as a preprocessing step before per-
forming projected model counting, described in Sec. 5.3.3. Finally, we discuss our technique for
sampling to create ÎS and N̂S in Sec. 5.3.4.
5.3.1 Extracting Πproc(C, I,S,O)
To analyze the leakage from proc, we need an accurate postcondition Πproc(C, I,S,O) for proc. In
practice, generating a postcondition for an arbitrary procedure is not trivial. Especially here, where
our concern is detecting leakage from a processor implementation when running an application—i.e.,
the procedure proc includes numerous cycles of a cycle-accurate implementation of the processor
logic as well as the software logic—the postcondition will be quite large.
Our general strategy to construct Πproc(C, I,S,O) in these circumstances is to assemble it one
cycle at a time. Yosys [94] provides a framework to convert the Verilog code for a processor design
to its internal register-transfer level (RTL) intermediate language, optimize or modify the design
using a series of passes, and finally translate the design to targeted formula through its back-end
pass. The SMT2 back-end pass defines a data structure for each hardware module representing the
module’s temporary hardware state, a function to implement the module’s state transition from
one cycle to the next, and an initialization function to initialize the module’s state. To incorporate
the software logic of proc, we compile the software to its hardware-readable assembly and load the
assembly into the instruction memory unit.
To mark the symbolic variables, the analyst defines a configuration file to mark as symbolic
each input parameter of proc (in this case, svar , ivar , and cvar ), which can be a software variable
73
located at a fixed location in the memory unit or a wire/register inside the hardware module. Our
modified Yosys SMT2 backend pass then tracks the constraints associated with this symbolic data
throughout a cycle execution. Specifically, it outputs a logical postcondition τproc(H
t−1,Ht) that
relates the hardware state Ht−1 : VarsH → ValsH at the end of cycle t− 1 to the hardware state H
t
that results from executing cycle t. We also use it to generate initialization logic Ψ0proc(C, I,S,H
0)
that characterizes the first-cycle starting state H0 using the configured symbolic inputs.
Using the transition logic, we construct a cycle-accurate postcondition ΨTproc representing the
logic between symbolic inputs and its internal hardware state one cycle at a time, leveraging the











We finally define Πproc(C, I,S,O) by defining O in terms of the sequence of hardware states 〈H
t〉Tt=0




T ) ∧ Γ(〈Ht〉Tt=0,O) (5.8)
For example, in cache-based side channels, the observable parameters are whether there is a cache
hit/miss during the execution, which is constructed using the values of the s2 hit register across
the execution (as demonstrated in Sec. 5.4.3).
In our experiments, we selected T to ensure the termination of the execution, based on our
knowledge gained by studying the CPU. A more conservative method would be to track the CPU
pipeline and call the SAT solver each cycle to check whether the last instruction has certainly
committed. We have confirmed that adding more cycles after the termination of the execution does
not affect Πproc meaningfully, as the additional cycles do not process any valid opcodes and so only
trivially change the hardware state.
5.3.2 Preprocessing formula for #∃SAT
Applying a correct combination of simplification techniques is critical to scaling the sampling












∣ to compute Ĵδn. Langniez et
al. [63] provides a summary of the options. As defined in Sec. 5.1 and Sec. 5.2.1, our computation
74
task is projected model counting (#∃SAT) [6] which counts feasible assignments of selected variables
in a propositional formula. The complexity of the counting problem is #NP-hard.
To simplify the logical formula, we use preprocessing (similar to that used in, e.g., [56, 70])
that fully applies equivalence-preserving simplification techniques (e.g., vivification and occurrence
reduction), and then partially applies SAT-preserving simplifications (e.g., literal eliminations, vari-
able eliminations and clause eliminations) targeting variables not in our counting target (i.e., not
marked as svar , ivar , cvar , or ovar ). For example, the partially applied blocked-clause elimination
will remove a clause if it contains a variable not in the counting target such that every resolvent
obtained by resolving on it is a tautology. Especially for hardware designs where the number of
possible states increases with the cycle count but many registers are not modified in some cycles,
preprocessing the formula can substantially reduce the redundancy between the initial and final
cycle formulas.
Although we only need the logic Πproc(C, I,S,O) to describe the relationship between the
attacker-observable outputs O and inputs C, I, S the translated conjunctive-normal-form (CNF)
propositional formula F produced by the commonly used Tseitin algorithm would have numer-
ous auxiliary variables. The number of auxiliary variables and clauses increases quickly with the
number of cycles. To reduce the use of auxiliary variables, we applied a state-of-art preprocessing
technique for model counting called B+E proposed by Lagniez et al. [62], who also discussed a pos-
sible application of this method to projected model counting. In our modified version of B+E, we
partition the variables in the CNF formula representing Πproc into two disjoint variable sets: Sup
containing the variables in VarsI , VarsC , VarsS , and VarsO , and Dep containing all other variables.
For each variable v in Dep and pair of clauses v̄ ∨Cj and v ∨C
′
i , we then resolve on v and replace
the clauses with their resolvent Cj ∨ C
′











































Using this algorithm, we eliminate variables in Dep so that the final formula uses only svar ,
ivar , cvar , and ovar without any auxiliary variables. This does have the side effect of introducing
more complicated clauses, however, and so we avoid eliminating v ∈ Dep when v is present in
numerous clauses (e.g., b · a > 500).
75
cycles
5 10 15 20 25 30 35
No simplification 49.1 126.6 177.0 304.2 459.0 667.6 867.6
Simplified 1.10 1.54 1.98 2.03 89.9 167.3 389.0
#∃SAT-preprocess 1.10 1.41 1.45 1.65 1.71 1.77 1.88
Table 5.1: CNF file size (MB) for logic formulas extracted from the RISC-V BOOM core config-
ured with a small application program, starting from an initial state with some symbolic memory
blocks and cache states (see Sec. 5.4.3). The CNF file size for a one-cycle execution with com-
pletely symbolic initial state is 40MB. Only computations that terminated within 10 minutes are
represented.
In Table 5.1, we present example sizes of CNF formulas generated using different simplifica-
tion options, for our case studies in Sec. 5.4. The row denoted by “No simplification” represents
the size of a directly translated CNF formula from the multi-cycle SMT formula, which linearly
increases with the number of cycles. The “Simplified” row is generated by directly applying the
CNF simplifications provided by CryptoMiniSAT 5.0 to the formula in “Original”, while the row
“#∃SAT-preprocess” is obtained by incrementally applying the preprocessor described in this section
to each ΨTproc (C ,I,S,H
T ) before using it to build ΨT+1proc (C ,I,S,H
T+1).
5.3.3 Measurement with declassification using projected model counting
Using CryptoMiniSAT 5.0 as the basic solver, we implemented a counter to estimate the nu-
merator and the denominator in the measurement Ĵδ(S, S′) in (5.5).
5.3.3(a) Counting Ĵδ(S, S′) To compute the measurement Ĵδ(S, S′), DINoMe needs to count
the size of X̃δS,S′ and X̌
δ
S,S′. Directly counting X̃
δ
S,S′ is not easy as the set difference operation will
























, it suffices to count
X̌δS,S′ and X̂
δ
S,S′ for each sample pair S, S
′. Intuitively, counting X̌δS,S′ could be expressed as a
projected model counting task over 〈C,O, I,S〉 in a quantifier-free SAT problem with two copies
of Πproc shown in F̌ below. F̌ is translated to a CNF proposition where it uses v bit variables to
76
represent 〈C,O,∆, I〉 and others to represent 〈S,S′, I, I′〉 and auxiliary variables.
F̌ ←
(
Πproc(C, I,S,O) ∨Πproc(C, I
′,S′,O)
)






(S(svar ) ∈ S ∧ S′(svar ) ∈ S′)





Following the Sec. 4.2.2(b) , two random, disjoint sets S and S′ of expected size n are specified
with distinct strings p, p̂ ∈ {0, 1}b where n = |S| /2b, and specifically with the constraint that for a
fixed hash function, the hash of each s ∈ S is p and the hash of each s′ ∈ S′ is p̂.
For X̂δS,S′ , we can define another projected model counting [6] task over 〈C,O,∆, I〉 in a
quantifier-free SAT problem F̂ shown below. F̂ uses the logical postcondition Πproc twice, where
the first copy is for the execution with a secret S(svar ) ∈ S and the second checks for existence of a
secret S′(svar ) ∈ S′ leading to a result O also possible with S. F̂ also checks the existence of some
secret (denoted by S′′(svar )) in the secret set S′ leading to the equivalent declassification value ∆
so that we can ensure the S and S′ cannot be distinguished by ∆.
F̂ ← Πproc ,δ(C, I,S,O,∆) ∧ S(svar ) ∈ S
∧ Πproc(C, I
′,S′,O) ∧ S′(svar ) ∈ S′
∧ Πδ(C, I
′′,S′′,∆) ∧ S′′(svar ) ∈ S′
(5.10)
5.3.3(b) Optimizations for counting X̃δS,S′ and X̌
δ
S,S′ Enumerating all solutions to (5.9)
and (5.10) using a solver is intractable. To estimate the number of solutions to each instead, we
used the approximate model counting technique due to Chakraborty et al. [15], specifically the
approach taken by Soos and Meel [88].
That is, by specifying a randomly selected hash function Ĥ b̂ : {0, 1}v → {0, 1}b̂ and an output

















































〈C,O,∆, I〉 ∈ X̌δS,S′ ∧ Ȟ










〈C,O,∆, I〉 ∈ X̂δS,S′ ∧ Ĥ
b̂(〈C,O,∆, I〉) = p̂
}
(5.12)
This optimization for model counting will limit the number of calls to the SAT solver by constraining
the number of solutions available, and thus make the counting more scalable for large set size. Thus,

















for various p̂, p̌.
Our primary departure from the implementation by Soos and Meel [88] lies in utilizing task-
specific properties in our counting tasks to reduce redundant effort in solution searching. Specifically,
since X̂δS,S′ ⊆ X̌
δ






S,S′ in our counting by defining Ĥ
b̂(〈C,O,∆, I〉)
to be the b̂-bit prefix of Ȟ b̌(〈C,O,∆, I〉) for b̂ ≤ b̌. Then once we have generated solutions in Ž p̌S,S′,
we speed up finding solutions in Ẑ p̂S,S′ for b̂ = b̌ (and so p̂ = p̌) by first checking each solution in




S,S′). Only if insufficient solutions are found with
b̂ = b̌ is b̂ reduced and the solver used to generate additional solutions in Ẑ p̂S,S′ for p̂ a b̂-bit prefix
of p̌.
In the case studies in Sec. 5.4, we set the error ǫ = 0.4 and confidence parameter δ = 0.9 in this
method of estimating the sizes of X̃δS,S′ and X̌
δ
S,S′, from which Ĵ
δ(S, S′) is estimated using (5.5).
For each set size n, we compute Ĵδn using at least 100 hash functions, i.e., implicit selection of pairs
S, S′ of expected size n.
5.3.4 Sampling N̂S and ÎS for interpretable learning
Similar to the counting process, to construct N̂S and ÎS , the sampler will select hash functionsH
randomly from a family and output values p randomly from its range to solve for tuples 〈C, I,S,S′〉
for which H (〈C, I,S,S′〉) = p (and are in NS or IS , respectively). In the following experiments, we
will generate up to 100,000 solutions for each of N̂S and ÎS , where 70% used for training and 30%
used for validation.
We cannot directly encode set difference, used in (5.6) and (5.7), using an equivalent quantifier-
free formula. To implement a sampler to generate solutions in the set difference, we will use one
solver (“E-solver”) to search for candidate solutions and another (“F-solver”) cancel candidates;
this is a commonly used algorithm for an SMT solver to solve exist-forall problems (e.g., see [32]).
78
E-Solver with H and p generates 〈C, I,S,S′,O,∆〉 satisfying
Πproc,δ(C, I,S,O,∆) ∧Πproc,δ(C, I
′,S′,O′,∆) ∧ O 6= O′ ∧H (C, I,S,S′) = p (5.13)
F-Solver cancels 〈C, I,S,S′,O,∆〉 satisfying (5.13) if there is some I′′ satisfying
Πproc,δ(C, I
′′,S′,O,∆) (5.14)
Figure 5.3: Generating examples in ÎS using EF-solver
Here, we will illustrate sampling IS , while sampling NS is similar.
Specifically, the sampler first uses the E-Solver to generate feasible solutions 〈C, I,S,S′〉 (see
(5.13)) that guarantee, for an attacker’s choosen C, the observable value O derived from S with I
could be different from an observable O′ generated by S′ with some I′ when the declassified value ∆
is the same. However, it does not guarantee the O is never feasible for S. To further test whether
the 〈C, I,S,S′〉 is in ÎS , we use the F-Solver to test whether 〈I′′,S′〉 for some I′′ could generate O
with 〈I,S〉 when they share the declassification value ∆, to check whether we need to cancel the
solution. That is, 〈C, I,S,S′〉 satisfying (5.13) but not (5.14) will be included in ÎS .
After generating enough 〈C, I,S,S′〉 tuples in N̂S and ÎS , the interpretation module trains local
support vector machine (SVM) classifiers [37] around each of 50 anchor points, after ruling out
data whose normalized Euclidean distance (i.e., after scaling each attribute to a value between
0 and 1, use Euclidean distance divided by the number of attributes) is more than 0.2 from the
anchor. Then a logistic regression model for NS and IS is learned using a gradient boosted tree
implementation xgboost [19]. To generate the interpretable models, we implemented the rule learner
using Skope-rules.
5.4 Case Studies
In this section, we illustrate DINoMe by describing its application to the BOOM core (htt
ps://github.com/riscv-boom/riscv-boom), an open-source RISC-V core that is susceptible to
cache-based side channels and Spectre attacks. The goal of these case studies was to illustrate
our methodology and to show how it can be useful to system analysts. We require these analysts to
specify the secret to protect and attacker-controlled and attacker-observable variables but, critically,
not the specific attacker algorithm.









Figure 5.4: Way-associated cache in BOOM
ory accesses. With different parameter configurations for BOOM (i.e., number of cache ways w
and whether to enable memory sharing), this case study shows how Ĵδn curves demonstrate the
effects of these settings on the leakage. Moreover, we implemented and evaluated two possible
mitigations, namely ScatterCache [93] and PhantomCache [90], both of which use a per-
security-domain memory-to-cache mapping to reduce but not eliminate the cache leakage. Our
measurements using Ĵδn illustrate which mitigation is better for a specific BOOM setting.
• We used DINoMe to demonstrate information leakage due to cache-based side channels from a
modular exponentiation function commonly used in cryptographic algorithms. The rule-based
interpretation explains how to choose attacker-controlled variables and which portion of the secret
are leaked.
• We evaluated software code snippets causing speculative execution and demonstrated how to use
declassification to narrow in on leakage caused by speculative execution (i.e., by declassifying
other leakage to reveal it). We found that some software with a short speculative window is not
sufficient to cause out-of-bound memory leakage in the latest version of BOOM.
5.4.1 BOOM configurations
BOOM provides a configurable L1 cache module using a random replacement policy, where its
memory-to-cache mapping is shown in Fig. 5.4. In the following experiments, we used pocket-size
hardware modules to replace the modules in the BOOM v2.2.3 configuration. Although the system
we evaluated is configured to be much smaller than an actual system, it preserves all of the original
functionality; analyzing artificially small but otherwise faithful configurations of a system is not
uncommon on model checking, for example (e.g., [8, 2]). Specifically, we set the cache line size to
bbytes = 64B and the total L1 data cache size to 1KB (16 cache lines in total). We then varied the
80
cache ways w and sets c (i.e., subject to w × c = 16) in Sec. 5.4.3 but used a fixed setting c = 2,
c = 8 for other evaluations. For the main memory, we set the memory size to 4KB and thus a
memory address is only 12 bits. For evaluation purposes, we used the upper half of the memory
address space as instruction memory and the lower half as data memory. To simplify the following
analysis, we removed the page table walker (PTW) module and assumed virtual addresses were the
same as physical addresses. For the instruction fetch, we set the fetch width to 4 and configured
the L1 instruction cache to a 1KB, 8-set, 2-way cache with a customized prefetching module that
preloaded the software workload at the first cycle.
One feature of BOOM is that it supports speculative execution, with which we will experiment
in Sec. 5.4.6. Speculative execution leverages a branch predictor, for which we used the GShare
branch predictor. The logical structure of GShare is shown in Fig. 5.5. When a prediction request
arrives for a branch instruction, the GShare predictor derives a value bidx from the certain bits
(denoted ‘idx’ in Fig. 5.5) in the instruction address and an instruction history register and then
uses bidx to index into a table to which we refer as ‘bpd’. Each entry of the ‘bpd’ table includes a
label called ‘CFI’ and a 2-bit ‘state’, of which one bit indicates whether the entry holds a strong or
weak prediction and the other bit holds that prediction (i.e., whether the branch will be taken or
not). If the ‘bpd{bidx}.CFI’ value matches the ‘CFI’ portion of the instruction address, then the
predictor uses the ‘bpd{bidx}.state’ value to make a branch prediction. The GShare predictor will
globally tune entries based on executions in any user’s domain. Thus, an attacker can easily affect
the ‘bpd’ table before victim’s execution, and so we include ‘bpd’ in VarsC . In our evaluation, we
fix the number of ‘bpd’ entries to 4 so that only 2 bits in the instruction address are used as ‘idx’
while another 2 bits (=log2(fetch width)) are used as its ‘CFI’ label.
In the following case studies, we added the ‘bpd’ table in the GShare module to VarsC and
registers in the L1 data cache module including the cache metadata, the replacement state (i.e., the
linear-feedback shift register (LFSR) for the random replacement policy), and the memory-to-cache
mapping (if using a nonfixed mapping) to VarsI .
5.4.2 Logically modeling cache states
The most common cache-based side-channel attacks are Prime+Probe, Flush+Reload, and
their variants (e.g., see [98, 95]). In a Prime+Probe attack, the attacker loads memory blocks to











Figure 5.5: Logical architecture for GShare branch predictor
then reads (Probes) these same blocks to determine which were evicted by the victim computation
during the Prime+Probe interval. In a Flush+Reload attack, the attacker Flushes a shared-
memory block from cache and then, after a Flush+Reload interval, accesses (Reloads) the
block to determine whether the block was brought back into the cache by the victim computation.
To model these attacks in our framework, it is necessary to model the effects on the cache of
the phases before victim execution (the Prime and Flush steps) and to define O to include the
results of the phases after victim execution (the Probe and Reload steps). To do so, we assume
that the adversary has access to memory blocks block1, block2, . . ., blockm aligned to cache lines,
and we define the RISC-V assembly routine acc by which the adversary can access the block with
index ℓ = Ĉ(‘blockIdx’) and empty Ŝ:
acc (Ĉ, Î, Ŝ)
li s0, 0x2000000
add s1, s0, ℓ
sll s1, s1, 6
lbu a2, 0(s1)





ℓ) for each 0 < t ≤ T̂ as in Sec. 5.3.1, where we empirically choose
T̂ = 45.
We use these postconditions in two ways. First, we use them to extract a constraint Γ(〈Ht〉Tt=1,O)
that defines the attacker’s observations O in terms of the hardware states 〈Ht〉Tt=1 induced by the
execution (see (5.8)). A naive attempt to do so would be to simply include in O the metadata for
each cache line at every step of the execution. However, this would grant too much power to an
attacker, who should not be given access to the tag values and the exact locations of blocks inside
82
a set. Instead, we permit only a weaker attacker (cf., abstract noninterference [42]) by defining the
constraint Γ(〈Ht〉Tt=1,O) that represents the view of cache hits and misses immediately observable


























for ℓ = Ĉ(‘blockIdx’). Here, CacheMiss is a BOOM-defined Verilog code snippet that, intuitively,
checks a set of cache lines where block ℓ might reside and returns 1 (in a register called s2 hits) if
none of those cache lines has a valid tag matched with block ℓ (and returns 0 otherwise). In this
way, we characterize the procedure acc using a logical postcondition without manually modeling
CacheMiss.
Second, we permit the attacker to control which of its blocks are loaded into the cache before
the victim runs. Specifically, the predicate Ψ0proc(C, I,S,H
0) that controls the initial hardware state
from which the victim executes is modified to constrain which of the attacker’s blocks are present
in cache, as communicated through a reserved variable ‘load’ ∈ VarsC , for which the C(‘load’) is a
bit vector of length m. That is, attacker block block ℓ should be loaded before the victim runs if
and only if C(‘load’)[ℓ] = 1. To effect this in the Ψ0proc(C, I,S,H




























Of course, we rename variables to ensure no conflicts between copies of Ĥtℓ included within the
C(‘load’)[ℓ] and O(‘hit’)[ℓ] constraints.
For an attacker, it is sufficient to control the cache using m = 16 blocks as the L1 data cache
consists of only 16 cache lines in our experiments.
5.4.3 Cache-based side channels


































(b) ∀ℓ : C(‘load’)[ℓ] = 1
Figure 5.6: Ĵn for Prime+Probe attacks
5.4.3(a) Without shared memory Here, we target a victim’s RISC-V assembly proc to
access a secret-indexed memory block not shared with the attacker, by setting the base address in
s0 to a value 0x2000010, in contrast to the one used in acc.
proc (C, I, S)
li s0, 0x2000010
add s1, s0, S(‘secret’)
sll s1, s1, 6
lbu a2, 0(s1)
(5.15)
We experimented with different numbers of cache sets c including c = 1 (i.e., 1-way, 16-set,
fully associative), c = 2 (i.e., 8-way, 2-set), c = 4 (i.e., 4-way, 4-set), c = 8 (2-way, 8-set), and
c = 16 (i.e., 1-way, 16-set, direct-mapped). As shown in Fig. 5.6(a), Ĵn increases when the number
of sets increases. Specifically, there is no leakage (Ĵn = 0 for all n) when c = 1. Using fewer cache
sets, each cache set is shared by more memory blocks, and so an attacker will have more difficulty
distinguishing one execution from others. When 1 < c < 16, Ĵn decreases as n grows, since the
attacker can learn only log2(c) bits about the secret and thus may be unable to distinguish secrets
in large sets (i.e., large n).
An example interference rule for IS generated as described in Sec. 5.2 with the highest precision
84
(1.00) and a recall ≈ 0.04 in a 2-way, 8-set cache is:
S(‘secret’)[2] ≥ 1 ∧ S(‘secret’)[1] < 1
∧ S(‘secret’)[0] ≥ 1 ∧ S′(‘secret’)[1] ≥ 1
∧ C(‘load’)[5] ≥ 1 ∧ C(‘load’)[13] ≥ 1
(5.16)
In this rule, the S and S′ conjuncts concretize the least significant 3 bits of S(‘secret’) (i.e.,
S(‘secret’) ≡ 5 mod 8) and the lowest bit of S′(‘secret’) (i.e., S′(‘secret’) ≡ 0 mod 2). The C
conjuncts are C(‘load’)[5] ≥ 1 and C(‘load’)[13] ≥ 1; note that 13 ≡ 5 mod 8. That is, an attacker
could load all blocks block ℓ with ℓ ≡ 5 mod 8 into cache to distinguish a secret S(‘secret’) ≡ 5 mod 8
from S′(‘secret’) mod 8 ∈ {0, 2, 4, 6}.
Our approach could not directly represent C(‘load’)[ℓ] ≡ S(‘secret’) mod c. So, the trees in the
model split the dataset based on the cache set index. As such, there were many other top-ranking
rules similar to (5.16), each focusing on one residue class of the secret value modulo c where c = 8
and constraining C(‘load’)[ℓ] = 1 for all ℓ with that residue class modulo c. Each such rule works
for 18 of S’s domain and
1
2 of S
′’s domain, thus only for 18 ×
1
2 ≈ 0.06 of secret pairs. The recall rate
0.04 < 0.06 indicates that priming the corresponding cache set ensures (i.e., precision = 1.0) the
interference but is not necessary to cause it.
Analogously, we can generate rules for the noninterference set NS , as well. One example
with precision 1.0 (i.e., that ensures noninterference) and recall 0.11 constrains the secret’s least-
significant 3 bits to be the same for S and S′:
|S(‘secret’)[2] − S′(‘secret’)[2]| < 1
∧ |S(‘secret’)[1] − S′(‘secret’)[1]| < 1
∧ |S(‘secret’)[0] − S′(‘secret’)[0]| < 1
(5.17)
This analysis illustrates that an attacker can easily distinguish S(‘secret’) and S′(‘secret’) when
priming a cache set used by S(‘secret’) or S′(‘secret’) but not both. It is therefore safe to assume

































(b) ∀ℓ : C(‘load’)[ℓ] = 0
Figure 5.7: Ĵn for Flush+Reload attacks
chances for leakage. The Ĵn measure under this specific attack is shown in Fig. 5.6(b). The worst
case will leak all of the 4-bit secret when using high-granularity memory-to-cache mapping, i.e.,
where c = 16.
5.4.3(b) With shared memory To evaluate the leakage with memory sharing enabled (i.e.,
with Flush +Reload attacks), we allow the attacker to control and observe all memory blocks
used by the victim by setting the base to 0x2000000 in proc instead of to 0x2000010 (see (5.15)).
Fig. 5.7(a) shows the corresponding Ĵn. The Ĵn curves are similar and close to 1 for all settings,
indicating that the leakage does not have much correlation with w. An example rule for interference
derived using the methodology of Sec. 5.2, having a precision of 1.0 and recall of ≈ 0.04, is
S
′(‘secret’) < 2 ∧ S′(‘secret’) ≥ 1 ∧ C(‘load’)[1] < 1 (5.18)
That is, if S′(‘secret’) = 1 then C(‘load’)[1] = 0 results in interference. Indeed, the other top-ranked
rules for this example (not shown) were roughly 32 similar rules, each one setting C(‘load’)[ℓ] = 0
for a specific secret value S(‘secret’) = ℓ or S′(‘secret’) = ℓ. The intuition behind these rules is
that an attacker can precisely detect if S(‘secret’) = ℓ by setting C(‘load’)[ℓ] = 0 (i.e., Flushing
block ℓ so he can later Reload it), and similarly for S
′(‘secret’). Going further, if an attacker sets
C(‘load’)[ℓ] = 0 for all ℓ, he can detect the victim’s access to any block ℓ, as shown in Fig. 5.7(b)






















Figure 5.9: PhantomCache (r =2)
5.4.4 Side-channel-resistant cache designs
To demonstrate the power of DINoMe in comparing different implementations, we evaluate two
cache designs for mitigating side channels, namely ScatterCache [93] and PhantomCache [90].
Unfortunately, Verilog specifications of these are unavailable, and so we implemented two simplified
cache modules (which we continue to refer to as ScatterCache and PhantomCache) in BOOM
following their paper designs.
ScatterCache maps a memory block to a cache line using a cryptographic index derivation
function computed using the block’s physical address and a private key. To simulate this index
derivation without choosing a concrete function, in Fig. 5.8, we use a symbolic look-up table denoted
by Mdom per security domain dom (dom = 0 denotes the victim’s domain and dom = 1 denotes
the attacker’s) to store the mapping from memory address to cache line. For security domain
dom, its access to memory contents at physical address paddr and so with block address baddr =
⌊paddr/bbytes⌋ is mapped to cache lines with way index k and set index j = Mdom{baddr}{k} for
k = 0, 1, . . . , w − 1. Similarly, for PhantomCache, we used a domain-specific memory-to-cache
mapping represented by Mrdom to allow a memory block to use cache lines in up to r cache sets
indexed byMrdom{baddr}{k} for k = 0, 1, . . . , r .




5.4.4(a) Random memory-to-cache mappings First, we experimented without memory
sharing when assuming the memory-to-cache mapping is completely unknown to the attacker. We
2In contrast to the original paper [90], we do not force each memory block to map to r unique cache sets, i.e., we do
not constrain Mrdom{baddr}{k} 6= M
r
dom{baddr}{k





































(b) ∀ℓ : C(‘load’)[ℓ] = 0
Figure 5.10: ScatterCache, unknown Mdom, memory sharing enabled (Flush+Reload attack)
ended up with Ĵn = 0 for all n in both ScatterCache and PhantomCache. The attacker
cannot tell which memory blocks are accessed by the victim, as an memory block could be mapped
to any cache line if the mapping is unknown. However, with shared memory, shown in Fig. 5.10(a)
and Fig. 5.10(b), it is still possible to learn some information about which memory block is ac-
cessed by the victim. Ĵn is high when n is large, indicating the attacker can precisely determine
S(‘secret’) when leakage occurs. Our results indicate that lower cache set granularity leaks more:
In Fig. 5.10(a), c = 1 leaks the most, which is similar to the normal cache. When c > 1, the leakage
is reduced.
Overall, with same cache set granularity, Ĵn is higher with PhantomCache with r = 2 than
PhantomCache with r = 1 and ScatterCache when memory is shared. This is because setting
r = 2 allows one physical address to be mapped to more cache sets and so gains more chance to
share cache lines cross domains.
Intuitively, Flush+Reload is the best attacker strategy for a normal cache design when mem-
ory sharing is enabled. However, for a new cache design, it may not be clear that it is still the best.
Our leakage rules provide some insight for ScatterCache and PhantomCache. For example,
two top-ranking rules for ScatterCache, both with precision ≥ 0.80 and recall of ≈ 0.02, are:
S(‘secret’)[3] ≥ 1 ∧ S(‘secret’)[2] < 1∧ S(‘secret’)[1] < 1 ∧S(‘secret’)[0] < 1















c = 1, r = 1
c = 4, r = 1
c = 4, r = 2
c = 8, r = 1
c = 8, r = 2














c = 1, r = 1
c = 4, r = 1
c = 4, r = 2
c = 8, r = 1
c = 8, r = 2
c = 16, r = 1
(b) ∀ℓ : C(‘load’)[ℓ] = 0
Figure 5.11: PhantomCache, unknownMrdom, memory sharing enabled (Flush+Reload attack)
These rules are similar to (5.18) but with some additional predicates about M0. Specifically, (5.19)
adds I(M0{8}{1}) ≥ 5 ∧ I(M1{8}{1}) ≥ 5 to the rule when setting C(‘load’)[8] = 0 (i.e., attacker
Flushes block8) and S(‘secret’) = 8, which indicates that the block8 should occupy line k = 1 in set
j = 5 in both the victim’s and attacker’s domains to ensure leakage about whether S(‘secret’) = 8
when the attacker Reloads block 8.
Thus, an attacker should Flush+Reload all blocks that could share cache lines between
victim’s and attacker’s domain to cause more leakage. Since the memory-to-cache mapping is
unknown, an attacker may Flush+Reload all shared memory blocks. The resulting Ĵn is shown
in Fig. 5.10(b) for ScatterCache and Fig. 5.11(b) for PhantomCache. Under the equivalent
cache settings, Ĵn is higher when the attacker takes maximum advantage of Flush+Reload
attacks (versus not, shown in Fig. 5.10(a) and Fig. 5.11(a)). We also see that Ĵn for ‘c = 8, r = 2’
is close to that for ‘c = 4, r = 1’, as randomly mapping to 2 out of 8 sets is similar to mapping to
1 out of 4 cache sets. Our evaluation results suggests that ScatterCache and PhantomCache
eliminate side-channel leakage when there is no shared memory and largely restrict it when there is
shared memory, if the address-to-cache mapping is random and remains unknown to the attacker.
5.4.4(b) Declassifying the memory-to-cache mapping When I(M) is unknown to the
attacker, our previous analysis shows that cache-based side channels are mitigated. Werner et
































c = 1, r = 1
c = 4, r = 1
c = 4, r = 2
c = 8, r = 1
c = 8, r = 2
c = 16, r = 1
(b) PhantomCache



























c = 1, r = 1
c = 4, r = 1
c = 4, r = 2
c = 8, r = 1
c = 8, r = 2
c = 16, r = 1
(b) PhantomCache
Figure 5.13: Memory sharing enabled (Flush+Reload attack), ∆(‘info’)← I(M) (or I(Mr ))
90
1: function modexp(b,d)
2: e ← 1
3: for i← n to 1 do
4: e ← e × e mod M
5: if di 6= 0 then







li sp , 0x80000400
li a0 ,1
li a2 , M
li a3 , S(di )
.oneIteration:
mulw a0 ,a0 ,a0
remw a0 ,a0 ,a2
beqz a3 ,. NextIteration
sll a5 ,a3 ,2
add a5 ,sp ,a5
lw a5 ,0( a5)
mulw a0 ,a0 ,a5
remw a0 ,a0 ,a2
.NextIteration:
(b) Assembly for one iteration
Figure 5.14: Sliding window modular exponentiation with window size W . dn . . . d1 is the private
key d where each di (i = 1, . . . , n) is a W -bit value.
a profiling procedure. If we declassify I(M), the interference Ĵδn will increase: Fig. 5.12(a) shows Ĵ
δ
n
due to Prime+Probe attacks in this case, and Fig. 5.13(a) shows the impact of this declassification
on Flush+Reload attacks.
Similarly, using ∆(‘info’)← I(Mrdom), we evaluate PhantomCache’s leakage when the random
mapping is declassified; results are shown in Fig. 5.12(b) and Fig. 5.13(b). Comparing Fig. 5.12(b)
and Fig. 5.12(a), PhantomCache’s leakage (measured by Ĵδn) for unshared memory is higher than
ScatterCache’s when r = 1. The strength of PhantomCache is revealed when r increases,
since it allows memory blocks to map to more than one cache set. Specifically, the leakage for
ScatterCache’s ‘c = 4’ is much less than PhantomCache’s ‘c = 4, r = 1’ but is similar
to PhantomCache’s ‘c = 4, r = 2’. However, PhantomCache with r = 2 provides weaker
protection for Flush+Reload than PhantomCache with r = 1 and ScatterCache.
5.4.5 Leaking exponent in modular exponentiation
The evaluations in Sec. 5.4.3 and Sec. 5.4.4 focused on whether the adversary could detect
the victim’s access to a particular memory block, which is a well-known vector of information
leakage. To further demonstrate the utility of our framework in measuring this type of leakage,
here we consider a classic example whereby the secret is not a memory address, but rather is a
cryptographic secret that, due to the algorithm in use, can influence the victim’s cache footprint.
The particular example we evaluate here is modular exponentiation as used in algorithms such
as RSA. A textbook implementation of modular exponentiation uses a sliding-window method that
is known to leak information in caches [98, 10]. As shown in Fig. 5.14(a), the algorithm leverages
91
some small powers b[k ] of a base b (where k < 2W − 1) to compute a larger power. Accesses to
those precomputed powers is determined by the window-sized segment di of the private key d in
each loop iteration i. First, this procedure will leak via the cache whether di is zero. Second, since
the precomputed elements are addressed by di , an attacker may identify up to log2 c bits about di
if those precomputed powers map to different cache sets.
To evaluate the one-round leakage of Fig. 5.14(a), we used the RISC-V assembly shown in
Fig. 5.14(b) in BOOM with a 2-way, 8-set cache (c = 8). The Ĵn measure shown in Fig. 5.15(a)
indicates that the amount of leakage for one loop iteration i is limited, when W ≤ 4 and so the
precomputed b only uses up to 4× 24 = 64 bytes (i.e., one cache line). When 4 < W < 8, the side
channel will leak more about di when W increases. Thus, choosing W = 4 is the best choice to
protect the secret in our cache configuration.
To further diagnose the cause of leakage, we generated the interference rules for W = 1, W = 4,
and W = 8. When W = 1, we obtain a single rule with precision and recall of 1.0, namely
C(‘load’)[0] ≥ 1 ∧ C(‘load’)[8] ≥ 1
This has no S or S′ related conjuncts, indicating that the 1-bit secret di is fully leaked when an
attacker Primes one cache set. In contrast, when W = 4, the top rules (precision of 1.0, recall
≥ 0.5) include some S or S′ related conjuncts, constraining the secret value to be zero, e.g.,
S(di ) < 1 ∧ C(‘load’)[0] ≥ 1 ∧ C(‘load’)[8] ≥ 1
That is, it only leaks whether it is zero or not for a 4-bit secret.
When W > 4, however, the most important cause of leakage changes from whether a memory
access happens to which cache set is used by di . For example, when W = 8, one highly ranked rule
(precision of 1.0, recall ≥ 0.04) is
S′(di )[6] < 1 ∧ S
′(di )[5] ≥ 1 ∧ S
′(di )[4] < 1
∧ S(di)[4] ≥ 1 ∧ C(‘load’)[10] ≥ 1 ∧ C(‘load’)[2] ≥ 1
which indicates that the attacker can distinguish an S′(di ) with S







































(b) ∀ℓ : C(‘load’)[ℓ] = 1
Figure 5.15: Ĵn for Modexp in 2-way, 8-set cache
S(di )[4 : 6] ∈ {1, 3, 5, 7} if the attacker Primes cache set 2. Similar to the analysis in Sec. 5.4.3(a) ,
rules for W = 8 illustrate that an attacker can reveal the cache set used by the victim (e.g., secret
bits 4-6) when priming all cache sets.
5.4.6 Cache-based side channels in speculative execution
Spectre and its variants have received widespread attention in recent years. In a Spectre
attack, a CPU predicts the outcome of a conditional branch and executes instructions based on that
prediction to reduce delays incurred by those instructions if its prediction was correct. However,
even if the prediction is incorrect, then some changes to the hardware state caused by specula-
tive execution will persist even after the mispredicted computations have been discarded. These
changes propagate information to exploitable side channels (cache-based side channels), allowing
the attacker to steal it.
To explore such leaks using our framework, we used the software pseudocode in Fig. 5.16(b)
and Fig. 5.16(c), each of which accesses an element of array arr2 at a secret index arr1[offset]. The
bounds check on offset is dependent on reading arr1.size from memory in Fig. 5.16(b) and on a
complex sequence of computations in Fig. 5.16(c). In the latter case, speculative execution might
leak arr1[offset] through cache-based side channels, i.e., by bringing arr2[(arr1[offset] × 64) & 1023]
into cache. Fig. 5.16(e) shows an important snippet of RISC-V assembly for Fig. 5.16(c) running
above BOOM with a 2-way, 8-set cache. To evaluate the software snippet in Fig. 5.16(b), we
change the block denoted by .complexDependency (Lines 10–16) with the .shortDependency in
93
conditionalAccess(offset, arr1.size)
if (offset < arr1.size)
tmp ← arr2[(arr1[offset] × 64) & 1023]
declassify(arr1[offset])
(a) Conditional memory access
victimFunc(offset,secret)
arr1[offset] ← secret
read arr1.size from memory;
conditionalAccess(offset, arr1.size)
(b) No bounds check bypass
victimFunc(offset,secret,arr1.size)
arr1[offset] ← secret
arr1.size← (arr1.size × 257) mod 256
arr1.size← (arr1.size × 257) mod 256
conditionalAccess(offset, arr1.size)
(c) Bounds check bypass
1 .shortDependency:




3 li a0 , I(‘arr1.size’)
4 li a1 , C(‘offset’)
5 li a2 , S(‘secret’)
6 //t3 ← arr1.addr
7 //t4 ← arr2.addr
8 add a3, t3 , a1
9 sb s2 , 0(a3)
10 .complexDependency:
11 li t1 ,0 x101
12 li t2 ,0 x100
13 mul a4 ,a0 ,t1
14 remuw a4,a4,t2
15 mul a4 ,a4 ,t1
16 remuw a0,a4,t2
17 .conditionalAccess:
18 bleu a0 ,a1 ,.end
19 add t3 ,t3 ,a1
20 lbu a3 ,0x0(t3)
21 sll a3 ,a3 ,6
22 and a3 ,a3 ,0x3ff
23 add a3 ,t4 ,a3
24 lbu a4 ,0( a3)
25 .end:
(e) Long speculation
Figure 5.16: Speculative execution example. Assembly in (e) is snippet from compilation of pseu-
docode in (c). Replacing lines 10–16 with (d) gives the analogous assembly for the pseudocode in
(b).
Fig. 5.16(d). Furthermore, we evaluated a mitigation similar to lfence [1], by adding a RISC-V
instruction ‘fence r,r’ just after Line 18 in Fig. 5.16(e).
We assume the attacker can control the offset value C(‘offset’), train theGShare branch predictor
C(‘bpd’) shown in Fig. 5.5, and use Flush+Reload to observe O(‘hit’). The attacker can use
the Flush+Reload-style attacks to precisely determine the index into arr2 if arr2 is shared
and thus four bits of arr1[offset]. Note that the secret value S(‘secret’) is assigned to arr1[offset]
as the first step of Fig. 5.16(b) and Fig. 5.16(c). We presume that I(‘arr1.size’) is an attacker-
known but not controlled variable; thus, we include it as one output parameters as well, i.e.,
O(‘arr1.size’)← I(‘arr1.size’).
As shown in Fig. 5.17, the Ĵn measures for ‘ShortSpec’ (denoting Fig. 5.16(d)) and ‘Fence’ are
somewhat similar to that for ‘LongSpec’ (denoting Fig. 5.16(e))—contrary to what intuition would
suggest. This counterintuitive result is due to the fact that leakage from in-bounds array accesses
is also being counted. By declassifying in-bounds array elements (i.e., declassifying arr1[offset] if
C(‘offset’) < I(‘arr1.size’)), we obtain a better picture of when leakage occurs. Specifically, when


















Figure 5.17: Ĵδn for Spectre in different procedures
with the short dependency (‘ShortSpec+δ’) and proc with the fence mitigation (‘Fence+δ’) do not
leak out-of-boundary memory contents, while the proc with the longer dependency (‘LongSpec+δ’)
continues to leak secret data and indeed, is just slightly lower than ‘complexDepend’.
In generating interference rules for proc with a long speculation (Fig. 5.16(e)), the linear feature
L0 = 0.005 × S(‘secret’)− 0.003 × S
′(‘secret’)− 0.494 × C(‘offset’) + 0.496 × I(‘arr1.size’)
≈ 0.5× I(‘arr1.size’)− 0.5 × C(‘offset’)
and specifically the conjunct L0 < 1 appears in many of the top ranked rules. Using the approxi-
mation of L0 above, L0 < 1 implies that I(‘arr1.size’) < C(‘offset’) + 2, and so the offset is indeed
out-of-bounds. An example top-ranked rule with precision 1.0 and recall 0.30 is
L0 < 1 ∧ C(‘bpd{0}.state’)[1] < 1 ∧ |S(‘secret’)[2] − S
′(‘secret’)[2]| ≥ 1
This rule indicates that an attacker can determine the third bit of the secret when the second bit of
the state of the prediction entry C(‘bpd{0}.state’) is 0 (‘strongly untaken’) or 1 (‘weakly untaken’).
Analogous rules appear in the list for each of bits 0-2 and 4 of the secret. Other highly ranked rules
95
(also with precision 1.0 and recall 0.30) are
L0 < 1 ∧ C(‘bpd{0}.CFI’)[0] ≥ 1 ∧ |S(‘secret’)[0] − S
′(‘secret’)[0]| ≥ 1 (5.20)
L0 < 1 ∧ C(‘bpd{0}.CFI’)[1] < 1 ∧ |S(‘secret’)[3] − S
′(‘secret’)[3]| ≥ 1 (5.21)
Rule (5.20) leaks the first bit of the secret when the ‘CFI’ value (i.e., C(‘bpd{0}.CFI’)) in the
prediction entry is 1 or 3, and (5.21) leaks the fourth bit when the ‘CFI’ value is 0 or 1. In these
cases, the ‘CFI’ value does not match the CFI portion of the instruction address (i.e., the address of
Line 18 in Fig. 5.16(e)), which was 0x800000800 + 0x44 (= 0b0 10 00100), yielding a CFI portion
of 0b 10 and bidx of 0b00. Because of the mismatch on CFI value, C(‘bpd{0}.state’) is ignored
and so speculation will not execute Lines 19–24. Though (5.20) and (5.21) are specific to the first
or fourth bit of the secret, respectively, analogous rules appear for each of bits 0-3.
We have performed this evaluation using earlier BOOM versions and noticed that the out-of-
bounds leakage was partially eliminated in version 2.2.3.3 Since that version, the miss handling
(MSHR) module of the L1 cache tracks branch prediction results and discards the pending cache
refill request if a misprediction is detected before the refill commit. This change prevents the bounds
check bypass in Fig. 5.16(b).
5.4.7 Performance
In DINoMe, we have four important components: an automated logical formula generator
(Sec. 5.3.1), a model counter (Sec. 5.3.3), a sampler (Sec. 5.3.4), and a rule learner (Sec. 5.2.2).
This section reports the time costs in the first three stages for all case studies we have evaluated.
We performed those experiments on a DELL PowerEdge R815 server with 2.3GHz AMD Opteron
6376 processors and 128GB memory.
The time to generate the logical postcondition is primarily influenced by the number of RISC-V
BOOM cycles represented by that postcondition, as we incrementally compose the formula cycle by
cycle. Computing Πproc required 20-40 minutes for the memory accessing experiments (100 cycles)
3In BOOM version 2.2.1, the victim program described in Fig. 5.16(b) also suffers the out-of-bounds leakage and

































































Unshared (∀ℓ : C(‘load’)[ℓ] = 1)
Shared (∀ℓ : C(‘load’)[ℓ] = 0)
Figure 5.18: Time used in one estimation of Ĵδ(S, S′)
in Sec. 5.4.3 and Sec. 5.4.4; 45 minutes for the modular exponentiation experiments (120 cycles) in
Sec. 5.4.5; and around 2 hours for the Spectre experiments (150 cycles) in Sec. 5.4.6.
Fig. 5.18 shows the runtime to compute one estimate of Ĵ(S, S′) or Ĵδ(S, S′) in the model
counting process; note the logarithmic y-axis. Specifically, counting for cache-based side channels
in ScatterCache and PhantomCache are much more expensive than others, where one estimate
requires up to 16 minutes. The difficulty in counting for ScatterCache (denoted by ‘Scatter’)
and PhantomCache (denoted by ‘Phantom’) is due to the large size of their counting variables.
For ScatterCache, the memory-to-cache mapping uses log2(c)×w bits per domain per memory
block for 32 memory blocks. Specifically, the 8-way 2-set ScatterCache (denoted by ‘Scatter
(c = 2)’), uses 512 bits to represent I(M), which means the counting process would add hundreds of
XOR constraints to compute one estimate, which greatly increases the difficulty to find a feasible
solution. To obtain the sample sets ÎS and N̂S , the sampling process generates a tuple in ÎS or
N̂S within seconds, as illustrated in Fig. 5.19.
Our reported results reflect estimations of Ĵ(S, S′) or Ĵδ(S, S′) for at least 100 S, S′ pairs per n,
and we sampled up to 100,000 tuples in ÎS and N̂S . These estimations and samplings are trivially
parallelizable and so, with horizontal scaling, can be performed in total times approaching those in
Fig. 5.18 and Fig. 5.19 to the extent budget allows.
5.5 Limitations
Despite the scalability improvements represented by DINoMe specifically for analyzing pro-
cessor designs, it still has limitations. First, due to the complexity of hardware logic, generating
the postcondition Πproc(C, I,S,O) for a proc representing both the OS and the application would


































































Shared(∀ℓ : C(‘load’)[ℓ] = 0)-ÎS
Unshared-N̂S
Shared-N̂S
Shared(∀ℓ : C(‘load’)[ℓ] = 0)-N̂S
Figure 5.19: Time used in generating one tuple in N̂S or ÎS
The DINoMe workloads described in this chapter represent a tradeoff, using a sequence of opcodes
with concretized operations and selected symbolic operands above a partially symbolic hardware
specification. To evaluate with more complicated software, a possible solution is to highly con-
cretize the initial hardware state (especially for the memory and cache states) or highly concretize
the software, at the cost of possibly missing some potential leakage that remains hidden due to this
concretization.
A second limitation of DINoMe, and specifically of its generation of interpretation rules to
explain leakage, is that the interpretation rules may not be complete, for two reasons. First, the
interpretation rules might skip a rule that only covers a small portion of leakage samples (i.e., with
low recall). A possible solution to address this source of incompleteness is to declassify the sources
of leakage exposed in the inference rules that are learned, and then rerun the learning process again.
Second, the conditions that result in leakage might be more complicated than can be learned using
decision trees built using local linear classifiers. To address this incompleteness, alternative learning
methods might be tried, though doing so while retaining interpretability will be a challenge.
5.6 Summary
Scaling high-fidelity, static noninterference measurement to complex computations has been a
challenge since the introduction of noninterference in the 1980’s [44]. We believe that we have
advanced the state-of-the-art in this area both generally and specifically for its application to
processor designs. Certain innovations in our DINoMe framework, such as the cycle-by-cycle
construction of the logical postcondition for processor execution, are specific to processor designs.
Others, such as our methods for declassification and interpreting leakage results, are not. Together,
however, they permit the measurement of leakage in complex scenarios, as we demonstrated through
98
usingDINoMe to analyze leakage due to speculative execution in the BOOM core and of published
defenses to mitigate it. Our analysis enables comparisons between defenses to discover, e.g., the
processor and defense parameterizations where one defense outperforms the other. Though the
performance of DINoMe suggests that static measurement of noninterference for processors is still
too time-intensive for highly interactive use, it is fast enough to permit multiple analysis iterations
per day in many cases. And through its improvements in declassification and interpretability, it
substantially facilitates human understanding of its measurement results.
99
CHAPTER 6: CONCLUSION
Any computation with insecure information flows is potentially vulnerable to side-channel at-
tacks. Noninterference was conceived as a requirement to eliminate any such flows. In practice,
however, absolute noninterference can rarely be achieved. This dissertation has thus made contri-
butions toward the development of measuring noninterference for real-world programs.
First, we explored noninterference assessment using empirical evaluations against cache-based
side-channel attacks. Relying on these empirical evaluations, we demonstrated CacheBar’s effec-
tiveness in defending against cache-based side channels in LLCs. However, model checking revealed
that empirical assessment was not enough, as it failed to capture interference not triggered by the
concrete experiments. This lesson motivated the use of formal approaches for assessing noninter-
ference more holistically.
Second, we suggested a static method for measuring interference from actual codebases. One
contribution of our measurement is its formulation of interference as the distinguishability of two
sets of secret values. This novel metric supports noninterference measurement in multiple dimen-
sions, reflecting how often secrets leak when the size n of secret sets is small and how much is leaked
when n is large. Case studies showed that our measurement framework has moderate runtime costs,
which range from minutes to days depending on the workload and the computation resources (i.e.,
parallelization is possible).
Third, we extended the static framework to relatively complicated computations including the
processor on which they execute, and implemented them in DINoMe. By leveraging the declassifi-
cation and interpretation capabilities of DINoMe, we measured and explained hardware-software
vulnerabilities. DINoMe analyzes a sequence of instructions running for hundreds of cycles above
the BOOM processor. Logical rules sorted by their precision and recall values explain the sources
of leakage without forcing the analyst to diagnose the leakage from the measurement value alone.
We demonstrated the possibility of using our frameworks to measure and interpret unintended
leakage in practice using our case studies on side-channel leakage through shared caches, traffic
100
analysis, adaptive compression algorithms, shared TCP network counters, sliding-window modular
exponentiation algorithms, and speculative executions. We hope that these demonstrations of
noninterference measurement will bring quantitative information flow (QIF) closer to practice and
help analysts develop better mitigations.
101
REFERENCES
[1] Intel analysis of speculative execution side channels. Technical report, Intel Corp., Jan 2018.
[2] P. A, Y. Rodeh, O. Strichman, and M. Siegel. The small model property: How small can it
be? Information and Computation, 178(1):279–293, 2002.
[3] O. Aciiçmez. Yet another microarchitectural attack: Exploiting I-cache. In ACM Workshop
on Computer Security Architecture, pages 11–18, 2007.
[4] J. Alawatugoda, D. Stebila, and C. Boyd. Protecting encrypted cookies from compression
side-channel attacks. In Financial Cryptography and Data Security, pages 86–106, 2015.
[5] A. Arcangeli, I. Eidus, and C. Wright. Increasing memory density by using KSM. In Linux
Symposium, pages 19–28, 2009.
[6] R. A. Aziz, G. Chu, C. Muise, and P. Stuckey. #∃SAT: Projected model counting. In
International Conference on Theory and Applications of Satisfiability Testing, pages 121–137.
Springer, 2015.
[7] M. Backes, B. Köpf, and A. Rybalchenko. Automatic discovery and quantification of infor-
mation leaks. In 30th IEEE Symposium on Security and Privacy, pages 141–153, 2009.
[8] T. Ball, B. Cook, V. Levin, and S. K. Rajamani. SLAM and static driver verifier: Technology
transfer of formal methods inside Microsoft. In International Conference on Integrated Formal
Methods, pages 1–20. Springer, 2004.
[9] A. Banerjee, D. A. Naumann, and S. Rosenberg. Expressive declassification policies and
modular static enforcement. In 29th IEEE Symposium on Security and Privacy, pages 339–
353, 2008.
[10] D. J. Bernstein, J. Breitner, D. Genkin, L. G. Bruinderink, N. Heninger, T. Lange, C. V.
Vredendaal, and T. Yarom. Sliding right into disaster: Left-to-right sliding windows leak. In
International Conference on Cryptographic Hardware and Embedded Systems, pages 555–576.
Springer, 2017.
[11] E. Bosman, K. Razavi, H. Bos, and C. Giuffrida. Dedup est machina: Memory deduplication
as an advanced exploitation vector. In 37th IEEE Symposium on Security and Privacy, pages
987–1004, 2016.
[12] C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-
coverage tests for complex systems programs. In 8th USENIX Symposium on Operating
Systems Design and Implementation, pages 209–224, 2008.
[13] Y. Cao, Z. Qian, Z. Wang, T. Dao, S. V. Krishnamurthy, and L. M. Marvel. Off-path TCP
exploits: Global rate limit considered dangerous. In 25th USENIX Security Symposium, pages
209–225, 2016.
[14] C. Celio, P. Chiu, B. Nikolic, P. D. A, and K. Asanovic. BOOMv2: an open-source out-
of-order RISC-V core. In 1st Workshop on Computer Architecture Research with RISC-V
(CARRV), 2017.
102
[15] S. Chakraborty, K. S. Meel, and M. Y. Vardi. A scalable approximate model counter. In
Principles and Practice of Constraint Programming, volume 8124 of LNCS, pages 200–216,
2013.
[16] P. Chapman and D. Evans. Automated black-box detection of side-channel vulnerabilities
in web applications. In 18th ACM Conference on Computer and Communications Security,
pages 263–274, 2011.
[17] Q. A. Chen, Z. Qian, Y. J. Jia, Y. Shao, and Z. M. Mao. Static detection of packet injection
vulnerabilities: A case for identifying attacker-controlled implicit information leaks. In 22nd
ACM Conference on Computer and Communications Security, pages 388–400, 2015.
[18] S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channel leaks in web applications: A
reality today, a challenge tomorrow. In 31st IEEE Symposium on Security and Privacy, pages
191–206, 2010.
[19] T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In 22rd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pages 785–794, 2016.
[20] V. Chipounov, V. Kuznetsov, and G. Candea. S2E: a platform for in-vivo multi-path analysis
of software systems. In 16th International Conference on Architectural Support for Program-
ming Languages and Operating Systems, pages 265–278, 2011.
[21] S. Chong and A. C. Myers. Security policies for downgrading. In 11th ACM conference on
Computer and communications security, pages 198–209, 2004.
[22] D. Clark, S. Hunt, and P. Malacaria. Quantitative analysis of the leakage of confidential data.
Electronic Notes in Theoretical Computer Science, 59(3):238–251, 2002.
[23] D. Clark, S. Hunt, and P. Malacaria. Quantitative information flow, relations and polymor-
phic types. Journal of Logic and Computation, 15(2):181–199, 2005.
[24] D. Clark, S. Hunt, and P. Malacaria. A static analysis for quantifying information flow in a
simple imperative language. Journal of Computer Security, 15(3):321–371, 2007.
[25] M. R. Clarkson, A. C. Myers, and F. B. Schneider. Belief in information flow. In 18th IEEE
Computer Security Foundations Workshop, pages 31–45, 2005.
[26] W. W. Cohen and Y. Singer. A simple, fast, and effective rule learner. 16th AAAI Conference
on Artificial Intelligence, 99:335–342, 1999.
[27] B. Coppens, I. Verbauwhede, K. D. Bosschere, and B. D. Sutter. Practical mitigations for
timing-based side-channel attacks on modern x86 processors. In 30th IEEE Symposium on
Security and Privacy, pages 45–60, 2009.
[28] S. Crane, A. Homescu, S. Brunthaler, P. Larsen, and M. Franz. Thwarting cache side-channel
attacks through dynamic software diversity. In 22nd Network and Distributed System Security
Symposium, pages 8–11, 2015.
[29] D. E. R. Denning. Cryptography and data security, volume 112. Addison-Wesley Reading,
1982.
[30] J. Dike. User-mode Linux. In 5th Annual Linux Showcase & Conference, pages 3–14, 2001.
103
[31] Y. Dodis, R. Ostrovsky, L. Reyzin, and A. Smith. Fuzzy extractors: How to generate strong
keys from biometrics and other noisy data. SIAM Journal on Computing, 38(1):97–139, 2008.
[32] B. Dutertre. Solving exists/forall problems with yices. In Workshop on Satisfiability Modulo
Theories, 2015.
[33] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private
data analysis. In Theory of Cryptography, pages 265–284, Berlin, Heidelberg, 2006. Springer
Berlin Heidelberg.
[34] K. P. Dyer, S. E. Coull, T. Ristenpart, and T. Shrimpton. Peek-a-boo, I still see you: Why
efficient traffic analysis countermeasures fail. In 33rd IEEE Symposium on Security and
Privacy, pages 332–346, 2012.
[35] L. Elizaveta and P. Bickel. The earth mover’s distance is the Mallows distance. In 8th
International Conference on Computer Vision, pages 251–256, 2001.
[36] J. Fan. Local linear regression smoothers and their minimax efficiencies. The Annals of
Statistics, pages 196–216, 1993.
[37] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin. LIBLINEAR: A library for large linear
classification. Journal of Machine Learning Research, 9(Aug):1871–1874, 2008.
[38] A. Ferraiuolo, R. Xu, D. Zhang, A. C. Myers, and G. Suh. Verification of a practical hardware
security architecture through static information flow analysis. In 22nd International Confer-
ence on Architectural Support for Programming Languages and Operating Systems, pages
555–568, 2017.
[39] M. Fokkema. Fitting prediction rule ensembles with R package pre. Journal of Statistical
Software, 92(12):1–30, 2020.
[40] J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of
Statistics, pages 1189–1232, 2001.
[41] J. H. Friedman and B. E. Popescu. Predictive learning via rule ensembles. The Annals of
Applied Statistics, 2(3):916–954, 2008.
[42] R. Giacobazzi and I. Mastroeni. Abstract non-interference: Parameterizing non-interference
by abstract interpretation. ACM SIGPLAN Notices, 39(1):186–197, 2004.
[43] R. Giacobazzi and I. Mastroeni. Abstract non-interference: a unifying framework for weak-
ening information-flow. ACM Transactions on Privacy and Security (TOPS), 21(2):1–31,
2018.
[44] J. A. Goguen and J. Meseguer. Security policies and security models. In 3rd IEEE Symposium
on Security and Privacy, pages 11–20, 1982.
[45] C. P. Gomes, A. Sabharwal, and B. Selman. Model counting. Handbook of Satisfiability, pages
633–654, 2008.
[46] J. W. Gray. Toward a mathematical foundation for information flow security. In 12nd IEEE
Symposium on Security and Privacy, pages 21–34, 1991.
104
[47] D. Gruss, C. Maurice, K. Wagner, S. Mangard, U. Zurutuza, and R. J. Rodŕıguez.
Flush+Flush: A stealthier last-level cache attack. In Detection of Intrusions and Malware,
and Vulnerability Assessment, volume abs/1511.04594, pages 279–299. Springer, 2016.
[48] X. Guo, R. G. Dutta, J. He, M. M. Tehranipoor, and Y. Jin. Qif-verilog: Quantitative
information-flow based hardware description languages for pre-silicon security assessment.
In IEEE International Symposium on Hardware Oriented Security and Trust, pages 91–100,
2019.
[49] Intel. Intel R© 64 and IA-32 Architectures Software Developer’s Manual, 2010.
[50] G. Irazoqui, T. Eisenbarth, and B. Sunar. S$A: A shared cache attack that works across
cores and defies VM sandboxing—and its application to AES. In 36th IEEE Symposium on
Security and Privacy, pages 591–604, 2015.
[51] A. Ivrii, S. Malik, K. S. Meel, and M. Y. Vardi. On computing minimal independent support
and its applications to sampling and counting. Constraints, 21(1):41–58, 2016.
[52] G. Kellaris, G. Kollios, K. Nissim, and A. O’Neill. Generic attacks on secure outsourced
databases. In 23rd ACM Conference on Computer and Communications Security, pages
1329–1340, 2016.
[53] J. Kelsey. Compression and information leakage of plaintext. In 9th International Workshop
on Fast Software Encryption, pages 263–276, 2002.
[54] G. Keramidas, A. Antonopoulos, D. N. Serpanos, and S. Kaxiras. Non deterministic caches:
A simple and effective defense against side channel attacks. Design Automation for Embedded
Systems, 12(3):221–230, 2008.
[55] T. Kim, M. Peinado, and G. Mainar-Ruiz. STEALTHMEM: System-level protection against
cache-based side channel attacks in the cloud. In 21st USENIX Security Symposium, pages
189–204, 2012.
[56] V. Klebanov, N. Manthey, and C. Muise. SAT-based analysis and quantification of informa-
tion flow in programs. In International Conference on Quantitative Evaluation of Systems,
pages 177–192. Springer, 2013.
[57] P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Man-
gard, and T. Prescher. Spectre attacks: Exploiting speculative execution. In 40th IEEE
Symposium on Security and Privacy, pages 1–19, 2019.
[58] R. Könighofer. A fast and cache-timing resistant implementation of the AES. In Topics in
Cryptology – CT-RSA, pages 187–202, 2008.
[59] B. Köpf and D. Basin. An information-theoretic model for adaptive side-channel attacks. In
14th ACM Conference on Computer and Communications Security, pages 286–296, 2007.
[60] B. Köpf and A. Rybalchenko. Approximation and randomization for quantitative information-
flow analysis. In 23rd IEEE Computer Security Foundations Symposium, pages 3–14, 2010.
[61] V. Kuznetsov, J. Kinder, S. Bucur, and G. Candea. Efficient state merging in symbolic
execution. In 33rd ACM Conference on Programming Language Design and Implementation,
pages 193–204, 2012.
105
[62] J. Lagniez, E. Lonca, and P. Marquis. Improving model counting by leveraging definability.
In 25th International Joint Conference on Artificial Intelligence, pages 751–757, 2016.
[63] J. Lagniez and P. Marquis. On preprocessing techniques and their impact on propositional
model counting. Journal of Automated Reasoning, 58(4):413–481, 2017.
[64] Linux blind TCP spoofing vulnerability. http://www.securityfocus.com/bid/580/info,
1999.
[65] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard,
P. Kocher, S. Genkin, Y. Yarom, and M. Hamburg. Meltdown: Reading kernel memory from
user space. In 27th USENIX Security Symposium, pages 973–990, 2018.
[66] F. Liu, Q. Ge, Y. Yarom, F. Mckeen, C. Rozas, G. Heiser, and R. B. Lee. Catalyst: Defeating
last-level cache side channel attacks in cloud computing. In 22nd IEEE Symposium on High
Performance Computer Architecture, pages 406–418, 2016.
[67] F. Liu and R. B. Lee. Random fill cache architecture. In 47th IEEE/ACM International
Symposium on Microarchitecture, pages 203–215, 2014.
[68] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee. Last-level cache side-channel attacks are
practical. In 36th IEEE Symposium on Security and Privacy, pages 605–622, 2015.
[69] P. Malacaria. Assessing security threats of looping constructs. In 34th ACM Symposium on
Principles of Programming Languages, pages 225–235, 2007.
[70] N. Manthey. Coprocessor 2.0–a flexible CNF simplifier. In International Conference on
Theory and Applications of Satisfiability Testing, pages 436–441. Springer, 2012.
[71] P. Mardziel, M. S. Alvim, M. Hicks, and M. R. Clarkson. Quantifying information flow for
dynamic secrets. In 35th IEEE Symposium on Security and Privacy, pages 540–555, 2014.
[72] R. MORRIS. A weakness in the 4.2 BSD Unix TCP/IP software. AT&T Bell Labs, Tech.
Rep. Comput. Sci., 117, 1985.
[73] D. A. Osvik, A. Shamir, and E. Tromer. Cache attacks and countermeasures: the case of
AES. In Topics in Cryptology – CT-RSA, pages 1–20. Springer, 2006.
[74] R. Owens and W. Wang. Non-interactive OS fingerprinting through memory de-duplication
technique in virtual machines. In 30th IEEE International Performance Computing and
Communications Conference, pages 1–8, 2011.
[75] C. Percival. Cache missing for fun and profit. In BSDCan, 2005.
[76] Q. Phan, L. Bang, C. S. Păsăreanu, P. Malacaria, and T. Bultan. Synthesis of adaptive side-
channel attacks. In 30th IEEE Computer Security Foundations Symposium, pages 328–342,
2017.
[77] Q. Phan and P. Malacaria. Abstract model counting: A novel approach for quantification of
information leaks. In 9th ACM Symposium on Information, Computer and Communications
Security, pages 283–292, 2014.
106
[78] C. S. Păsăreanu, Q. Phan, and P. Malacaria. Multi-run side-channel analysis using symbolic
execution and max-SMT. In 29th IEEE Computer Security Foundations Symposium, pages
387–400, 2016.
[79] Z. Qian, Z. M. Mao, and T. Xie. Collaborative TCP sequence number inference attack –
how to crack sequence number under a second. In 19th ACM Conference on Computer and
Communications Security, pages 593–604, 2012.
[80] S. Raikin, J. D. Sager, Z. Sperber, E. Krimer, O. Lempel, S. Shwartsman, A. Yoaz, and
O. Golz. Tracking mechanism coupled to retirement in reorder buffer for indicating sharing
logical registers of physical register in record indexed by logical register, 2014. US Patent
8,914,617.
[81] A. Rane, C. Lin, and M. Tiwari. Raccoon: Closing digital side-channels through obfuscated
execution. In 24th USENIX Security Symposium, pages 431–446, 2015.
[82] M. T. Ribeiro, S. Singh, and C. Guestrin. Anchors: High-precision model-agnostic explana-
tions. In 32rd AAAI Conference on Artificial Intelligence, pages 1527–1535, 2018.
[83] A. Sabelfeld and A. C. Myers. A model for delimited information release. In International
Symposium on Software Security, pages 174–191. Springer, 2003.
[84] A. Sabelfeld and D. Sands. Declassification: Dimensions and principles. Journal of Computer
Security, pages 517–548, 2009.
[85] S. Sanfilippo. Small strings compression library. https://github.com/antirez/smaz, 2009.
[86] G. Smith. On the foundations of quantitative information flow. In International Conference
on Foundations of Software Science and Computational Structures, pages 288–302. Springer,
2009.
[87] G. Smith. Quantifying information flow using min-entropy. In 8th International Conference
on Quantitative Evaluation of Systems, pages 159–167, 2011.
[88] M. Soos and K. S. Meel. BIRD: Engineering an efficient CNF-XOR SAT solver and its appli-
cations to approximate model counting. In 36th AAAI Conference on Artificial Intelligence,
volume 33, pages 1592–1599, 2019.
[89] M. Soos, K. Nohl, and C. Castelluccia. Extending SAT solvers to cryptographic problems. In
Proceedings of the 12th International Conference on Theory and Applications of Satisfiability
Testing, pages 244–257, 2009.
[90] Q. Tan, Z. Zeng, K. Bu, and K. Ren. Phantomcache: Obfuscating cache conflicts with
localized randomization. In 27th Network and Distributed System Security Symposium, 2020.
[91] Z. Wang and R. B. Lee. New cache designs for thwarting software cache-based side channel
attacks. In 34th International Symposium on Computer Architecture, pages 494–505, 2007.
[92] Z. Wang and R. B. Lee. A novel cache architecture with enhanced performance and security.
In 41st IEEE/ACM International Symposium on Microarchitecture, pages 83–93, 2008.
[93] M. Werner, T. Unterluggauer, L. Giner, M. Schwarz, D. Gruss, and S. Mangard. ScatterCache:
Thwarting cache attacks via cache set randomization. In 28th USENIX Security Symposium,
pages 675–692, Santa Clara, CA, 2019.
107
[94] C. Wolf. Yosys open synthesis suite. http://www.clifford.at/yosys/.
[95] Y. Yarom and K. E. Falkner. FLUSH+RELOAD: A high resolution, low noise, L3 cache
side-channel attack. In 23rd USENIX Security Symposium, pages 719–732, 2014.
[96] K. Zhang, Z. Li, R. Wang, X. Wang, and S. Chen. Sidebuster: Automated detection and
quantification of side-channel leaks in web application development. In 17th ACM Conference
on Computer and Communications Security, pages 595–606, 2010.
[97] R. Zhang, C. Deutschbein, P. Huang, and C. Sturton. End-to-end automated exploit gen-
eration for validating the security of processor designs. In 51st IEEE/ACM International
Symposium on Microarchitecture, pages 815–827, 2018.
[98] Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Cross-VM side channels and their use
to extract private keys. In 19th ACM Conference on Computer and Communications Security,
pages 305–316, 2012.
[99] Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Cross-tenant side-channel attacks in
PaaS clouds. In 21st ACM Conference on Computer and Communications Security, pages
990–1003, 2014.
[100] Z. Zhou, Z. Qian, M. K. Reiter, and Y. Zhang. Static evaluation of noninterference using
approximate model counting. In 39th IEEE Symposium on Security and Privacy, pages 514–
528, 2018.
[101] Z. Zhou, M. K. Reiter, and Y. Zhang. A software approach to defeating side channels in last-
level caches. In 23rd ACM Conference on Computer and Communications Security, pages
871–882, 2016.
108
