Provably sound semantics stack for multi-core system programming with kernel threads by Alekhin, Artem
Provably Sound Semantics Stack
for Multi-Core System Programming
with Kernel Threads
Dissertation
zur Erlangung des Grades
des Doktors der Ingenieurwissenschaften (Dr.-Ing.)
der Fakulta¨t fu¨r Mathematik und Informatik
der Universita¨t des Saarlandes
vorgelegt von
Artem Alekhin
Saarbru¨cken, August 2016
Tag des Kolloquiums: 24.02.2017
Dekan: Univ.-Prof. Dr. Frank-Olaf Schreyer
Vorsitzender des Pru¨fungsausschusses: Prof. Dr. Holger Hermanns
1. Berichterstatter: Prof. Dr. Wolfgang J. Paul
2. Berichterstatter: Prof. Dr. Andreas Podelski
Akademischer Mitarbeiter: Dr. Hristo Pentchev
Abstract
Operating systems and hypervisors (e.g., Microsoft Hyper-V) for multi-core processor archi-
tectures are usually implemented in high-level stack-based programming languages integrated
with mechanisms for the multi-threaded task execution as well as the access to low-level hard-
ware features. Guaranteeing the functional correctness of such systems is considered to be a
challenge in the field of formal verification because it requires a sound concurrent computa-
tional model comprising programming semantics and steps of specific hardware components
visible for the system programmer.
In this doctoral thesis we address the aforementioned issue and present a provably sound
concurrent model of kernel threads executing C code mixed with assembly, and basic thread
operations (i.e., creation, switch, exit, etc.), needed for the thread management in OS and hy-
pervisor kernels running on industrial-like multi-core machines.
For the justification of the model, we establish a semantics stack, where on its bottom the
multi-core instruction set architecture performing arbitrarily interleaved steps executes binary
code of guests/processes being virtualized and the compiled source code of the kernel linked
with a library implementing the threads. After extending an existing general theory for con-
current system simulation and by utilising the order reduction of steps under certain safety
conditions, we apply the sequential compiler correctness for the concurrent mixed program-
ming semantics, connect the adjacent layers of the model stack, show the required properties
transfer between them, and provide a paper-and-pencil proof of the correctness for the kernel
threads implementation with lock protected operations and the efficient thread switch based on
the stack substitution.
Kurzzusammenfassung
Betriebssysteme und Hypervisoren (z.B. Microsofts Hyper-V) fu¨r Mehrkernprozessor-Archi-
tekturen sind u¨blicherweise in stapelbasierten ho¨heren Programmiersprachen implementiert,
welche integrierte Mechanismen fu¨r die mehrfadige Aufgabenausfu¨hrung sowie den Zugriff
auf niedrige Hardwarefunktionen hat. Die funktionale Korrektheit solcher Systeme zu garantie-
ren wird im Gebiet der formalen Verifikation als Herausforderung angesehen, da sie eine kor-
rektes nebenla¨ufiges Berechnungsmodell beno¨tigt, welches die Programmiersprachensemantik
ebenso beinhaltet wie die fu¨r Systemprogrammierer sichtbaren Schritte der einzelnen Hard-
warekomponenten.
In dieser Dissertation gehen wir die eben genannten Probleme an und pra¨sentieren ein be-
weisbar korrektes, nebenla¨ufiges Modell von Betriebssystemkernfaden, die gemischten C- und
Assemblerkode ausfu¨hren, sowie grundlegenden Fadenoperationen (d.h. Erstellung, Wechsel,
Stopp, usw.), die fu¨r die Fadenverwaltung in Kernen der Betriebssysteme und Hypervisoren,
welche auf praxisnahen Mehrkernprozessoren laufen sollen, no¨tig sind.
Fu¨r die Rechtfertigung des Modells richten wir einen semantischen Stapel ein, dessen niedrig-
ste Ebene die Mehrkernprozessorarchitektur ist, die beliebig verschra¨nkte Schritte des Bina¨r-
kodes von virtualisierten Ga¨sten bzw. Prozessen und den kompilierten Quelltext des Kerns,
verlinkt mit einer Bibliothek, welche die Faden implementiert, ausfu¨hrt. Nach wir eine bere-
its vorhandene allgemeine Theorie fu¨r die Simulation nebenla¨ufiger Systeme erweitern und
indem wir die Reihenfolgen-Reduktion von Schritten unter bestimmten Sicherheitsbedingun-
gen verwenden, verbinden wir die benachbarten Ebenen des Modellstapels, zeigen die no¨tigen
Eigenschaftsu¨bertragungen zwischen ihnen, und geben einen Papier-und-Bleistift-Beweis fu¨r
die Korrektheit der Implementierung der Kernfa¨den mit durch Sperren geschu¨tzten Operatio-
nen und einem effizienten Fadenwechsel, welcher auf Stapelsubstitution basiert.
Acknowledgments
First and foremost, I would like to express my gratitude to Prof. Dr. Wolfgang Paul for inviting
me to join a leading research team in the field of formal verification and his valuable advice
on how to make complicated formal theories simple and clear even for “grandmothers” if, of
course, they are not professors in mathematics.
I am indebted also to my former colleagues at the chair of Prof. Paul, particularly to Christoph
Baumann, for productive discussions about the research topic, found issues and their solutions.
I am deeply thankful to my friends, especially to David Spieler, Natalia Tsetskhladze-Spieler,
Farid Naghizade, Olaf Zeitz, and Guido Do¨hler, for their understanding, constant encourage-
ment, readiness to help when I was in need, and simply for the great time spent together.
And finally, above all, I want to thank my parents and grandparents for believing in me, their
love, patience, and support over all years of my studies and despite of thousands of kilometres
between us during my work on this PhD thesis.
Contents
1 Introduction 1
1.1 Motivation and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Model for Concurrent Systems and Simulation 13
2.1 Abstract Model with Memory and Ownership . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Signature and Instantiation Parameters . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Cosmos Machine Configuration and Semantics . . . . . . . . . . . . . . . . 16
2.1.3 Computations and Step Sequence Notation . . . . . . . . . . . . . . . . . . 18
2.1.4 Ownership Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Ownership Based Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 System Simulation in Concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 Block Machine Semantics and Reduction to Block Computations . . . . . 25
2.3.2 Generalized Sequential Simulation Theorem . . . . . . . . . . . . . . . . . 26
2.3.3 Cosmos Model Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Order Reduction on the Concrete Level . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.1 Order Reduction for Suitable Schedules . . . . . . . . . . . . . . . . . . . . 43
2.4.2 Applying the Cosmos Model Simulation Theorem . . . . . . . . . . . . . . 44
2.5 Property Transfer From Abstract to Concrete Level . . . . . . . . . . . . . . . . . . 46
3 Formal Model of MIPS-86 49
3.1 Instruction Set Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.1 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.1.2 Store Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.3 Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1.4 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Single Core MIPS-86 ISA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.1 Configuration Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.2 Memory and Store Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.3 Translation Lookaside Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.4 Processor Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.4.1 Configuration and Transition Overview . . . . . . . . . . . . . . 60
3.2.4.2 Instruction Execution . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2.4.3 Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.5 Single Core MIPS-86 ISA Transitions . . . . . . . . . . . . . . . . . . . . . . 70
3.2.5.1 Processor Core Step . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.5.2 Store Buffer Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.5.3 Translation Lookaside Buffer Step . . . . . . . . . . . . . . . . . . 73
3.3 Multi-Core MIPS-86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels 77
4.1 Store Buffer Reduced MIPS-86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Memory Address Space in Hypervisor Context . . . . . . . . . . . . . . . . . . . . 80
4.3 MIPS-86 Cosmos Model Instantiations . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.1 Reference Machine Instantiation . . . . . . . . . . . . . . . . . . . . . . . . 80
i
Contents
4.3.2 Reduced Machine Instantiation . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4 Store Buffer Reduction Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4.1 Simulation for the Single Core Machine . . . . . . . . . . . . . . . . . . . . 85
4.4.2 SB Reduction for the Multi-Core MIPS-86 . . . . . . . . . . . . . . . . . . . 91
5 Concurrent Mixed Machine Semantics for MIPS-86 99
5.1 Sequential Macro Assembly Semantics . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.1 Instructions and Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.2 Machine Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1.3 Compiler Calling Convention for MIPS-86 . . . . . . . . . . . . . . . . . . 102
5.1.4 Transition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Sequential Intermediate C Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.1 Types and Qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.2 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.3 Expressions and Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.4 C-IL Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2.5 Machine Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2.6 Environment Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2.7 Semantics of C-IL Memory Accesses . . . . . . . . . . . . . . . . . . . . . . 116
5.2.8 Type and Expression Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 117
5.2.9 Transition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.3 Sequential Mixed Machine Semantics . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.3.1 MX Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.3.2 Environment Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.3.3 Machine Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.3.4 Transition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.4 Concurrent Mixed Machine Semantics . . . . . . . . . . . . . . . . . . . . . . . . . 132
6 Compiler Correctness and Justification of Concurrent Mixed Model 135
6.1 Compiler Correctness for Sequential Mixed Machine . . . . . . . . . . . . . . . . 135
6.1.1 Mixed Program Compilation and Stack Layout . . . . . . . . . . . . . . . . 135
6.1.1.1 Macro Assembly Compiler Information and Stack Frames . . . . 136
6.1.1.2 C-IL Compiler Information and Stack Frames . . . . . . . . . . . 139
6.1.1.3 Mixed Compiler Information and Stack . . . . . . . . . . . . . . 145
6.1.2 Sequential Mixed Compiler Correctness . . . . . . . . . . . . . . . . . . . . 149
6.1.2.1 Compiler Consistency Relation . . . . . . . . . . . . . . . . . . . 149
6.1.2.2 Consistency Points . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.1.2.3 Accessed Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.1.2.4 Volatile Accesses, IO- and OT -Points . . . . . . . . . . . . . . . 158
6.1.2.5 Addresses of Global Variables and Constants . . . . . . . . . . . 163
6.1.2.6 Requirements and Conditions for MIPS-86 Machine . . . . . . . 164
6.1.2.7 MX Software Conditions . . . . . . . . . . . . . . . . . . . . . . . 164
6.1.2.8 Sequential MX Compiler Correctness in Concurrent Context . . 166
6.2 Concurrent MX Model Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.2.1 Cosmos Model Instantiations . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.2.2 Sequential Simulation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.2.3 Concurrent Model Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.2.4 Application of the Concurrent MX Machine Simulation . . . . . . . . . . . 172
6.2.4.1 Properties Transfer from MX Machine to Reduced MIPS-86 ISA . 172
6.2.4.2 Sketch for the Application of Store Buffer Reduction . . . . . . . 175
ii
Contents
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86 177
7.1 Sequential Extended Mixed Machine (MXA) Semantics . . . . . . . . . . . . . . . 178
7.1.1 MXA Programs and Environment Parameters . . . . . . . . . . . . . . . . 178
7.1.2 Machine Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.1.3 Stack Information Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.1.4 Base Address and Maximal Size of the Current Stack . . . . . . . . . . . . 180
7.1.5 MX Thread Configuration Reconstruction . . . . . . . . . . . . . . . . . . . 180
7.1.6 Transition Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.2 Concurrent Extended Mixed Machine Semantics . . . . . . . . . . . . . . . . . . . 192
7.3 Compiler Correctness for the Sequential Extended Mixed Machine . . . . . . . . 193
7.3.1 Sequential Simulation Relation . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.3.2 Consistency Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.3.3 Accessed Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7.3.4 IO- and OT -Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7.3.5 Requirements and Conditions for MIPS-86 Machine . . . . . . . . . . . . . 198
7.3.6 MXA Sofware Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.3.7 Sequential MXA Compiler Correctness in Concurrent Context . . . . . . . 200
7.4 Justification of the Concurrent Extended Mixed Model . . . . . . . . . . . . . . . 202
7.4.1 Cosmos Model Instantiations . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.4.2 Sequential Simulation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 203
7.4.3 Concurrent Model Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.4.4 Application of the Concurrent MXA Machine Simulation . . . . . . . . . . 205
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria 207
8.1 Abstract Model of Kernel Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.1.1 Program and Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
8.1.2 Sequential Semantics in Concurrent Setting . . . . . . . . . . . . . . . . . . 209
8.1.2.1 Machine Configuration . . . . . . . . . . . . . . . . . . . . . . . . 209
8.1.2.2 Transition Function . . . . . . . . . . . . . . . . . . . . . . . . . . 211
8.1.3 Concurrent Semantics of Kernel Threads . . . . . . . . . . . . . . . . . . . 220
8.2 Implementation of Kernel Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.2.1 Framework for Kernel Threads . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.2.1.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
8.2.1.2 Implementation of Primitives . . . . . . . . . . . . . . . . . . . . 222
8.2.2 Program Linking and Obtaining the Implementation Model . . . . . . . . 226
8.3 Sequential Correctness in Concurrent Setting . . . . . . . . . . . . . . . . . . . . . 229
8.3.1 Consistency Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8.3.2 Well-Formed Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.3.3 Sequential Simulation Relation . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.3.4 Accessed Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.3.5 IO- and OT -Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8.3.6 Requirements and Conditions for MXA Machine . . . . . . . . . . . . . . . 243
8.3.7 Software Conditions for Kernel Threads . . . . . . . . . . . . . . . . . . . . 243
8.3.8 Sequential Correctness for Kernel Threads in Concurrent Context . . . . . 245
8.4 Justification of the Concurrent Model . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.4.1 Cosmos Model Instantiations . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.4.2 Sequential Simulation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 255
8.4.3 Concurrent Model Simulation and Its Application Overview . . . . . . . . 255
9 Conclusion and Future Work 267
iii
Contents
Appendix A: Implementation of Spinlocks with Processor ID 269
Appendix B: Implementation of Operations on Doubly Linked Lists of Threads 271
Appendix C: Long List of “Small” Mistakes in PhD Dissertations/Books 273
iv
List of Theorems
2.1 Lemma (Ownership Safe Steps Preserve the Ownership Invariant) . . . . . . . 21
2.2 Lemma (Equivalent Reordering of IOIP Sequences) . . . . . . . . . . . . . . . 22
2.3 Lemma (Interleaving Point Schedule Existence) . . . . . . . . . . . . . . . . . . . 22
2.4 Lemma (Safety of Reordered Computations) . . . . . . . . . . . . . . . . . . . . . 23
2.1 Corollary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Lemma (Environment Steps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Theorem (IP-Schedule Order Reduction) . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Lemma (Coverage) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Lemma (Block Machine Schedule Existence) . . . . . . . . . . . . . . . . . . . . . 26
2.2 Theorem (Block Machine Reduction) . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Theorem (Generalized Sequential Simulation Theorem) . . . . . . . . . . . . . . 29
2.8 Lemma (Memory Addresses without other Units’ Local Addresses) . . . . . . . 33
2.9 Lemma (Safe Local Steps Preserve Shared Invariant) . . . . . . . . . . . . . . . . 35
2.4 Theorem (Cosmos Model Simulation Theorem) . . . . . . . . . . . . . . . . . . . 35
2.1 Assumption (Safety Transfer with sinv and uinv Preservation for a Stepping
Unit) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Assumption (Preservation of Well-Formedness for Other Units of Abstract Ma-
chine) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 Assumption (Preservation of Well-Formedness for Other Units of Concrete Ma-
chine) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Assumption (Preservation of Simulation Relation and uinv for Other Units) . . 38
2.5 Theorem (IP-Schedule Order Reduction for Suitable Schedules) . . . . . . . . 44
2.5 Assumption (Well-Behaviour Restriction) . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Corollary (Simulating IP-Schedules are Safe and Well-Behaved) . . . . . . . . 46
2.3 Corollary (IOIP Assumption for Simulating IP-Schedules) . . . . . . . . . . . 46
2.4 Corollary (Ownership Transfer for Simulating Block Machines) . . . . . . . . . 46
2.6 Theorem (Simulated Safety Property Transfer) . . . . . . . . . . . . . . . . . . . 48
2.5 Corollary (Complete Simulated Property Transfer) . . . . . . . . . . . . . . . . . 48
4.1 Theorem (SB Reduction for a Step of the Single Core MIPS-86) . . . . . . . . . 86
4.1 Lemma (Hypervisor Memory Addresses Covered by the Unit’s SB Reduction
Sequential Simulation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Lemma (Shared and Read-Only Memory Equality from SB Reduction Simula-
tion Relation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3 Lemma (Safety Transfer and Invariants Preservation for Store Buffer Reduction) 94
4.4 Lemma (Preservation of Simulation Relation and Unit’s Invariant for SB Re-
duction) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.1 Theorem (Sequential MX Compiler Correctness in the Concurrent Context) . . 167
6.2 Theorem (Sequential MX Compiler Correctness for Cosmos Model Simulation) 171
6.3 Theorem (CosmosModel Simulation Theorem for all Mixed Machine Programs
and System/Environment Parameters) . . . . . . . . . . . . . . . . . . . . . . . . 172
6.1 Lemma (MIPS-86 Well-Behaviour Restriction for Concurrent MX Simulation) . 172
6.2 Lemma (Transfer of the Property PSpi,θ,ξMX ) . . . . . . . . . . . . . . . . . . . . . . . 174
v
List of Theorems
6.1 Corollary (Well-Behaviour and Safety Transfer for Arbitrary Schedules in MX
Simulation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.1 Lemma (Uniqueness of Found sba and mss) . . . . . . . . . . . . . . . . . . . . . 180
7.2 Lemma (Unique Control Information Reconstruction) . . . . . . . . . . . . . . . 183
7.1 Theorem (Sequential MXA Compiler Correctness in Concurrent Context) . . . 201
7.2 Theorem (Sequential MXA Compiler Correctness for Cosmos Model Simulation)204
7.3 Theorem (CosmosModel Simulation Theorem for all MX Programs with Inline
Assembly and System/Environment Parameters) . . . . . . . . . . . . . . . . . . 205
8.1 Theorem (Sequential Correctness for Kernel Threads in Concurrent Context) . 246
8.2 Theorem (Sequential Kernel Threads Correctness for Cosmos Model Simulation)255
8.1 Lemma (Ownership Invariant for the MXA Machine Implementing Kernel
Threads) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
8.2 Lemma (Safety Transfer and Invariants Preservation for Correctness of Con-
current Kernel Threads) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
8.3 Lemma (Preservation of Simulation Relation for Concurrent Kernel Therads) . 262
8.4 Lemma (Preservation of Well-Formedness for Other Units in Machine with
Kernel Threads) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
8.5 Lemma (Preservation of Well-Formedness for Other Units in Machine Imple-
menting Kernel Threads) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
8.3 Theorem (Cosmos Model Simulation Theorem for all Programs of Hypervisor
/ OS Kernels Using Threads) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
vi
1 Introduction
1.1 Motivation and Overview
Formal verification of operating systems and hypervisors represents an ongoing challenge for
the scientific community (e.g., see the survey on projects in [Kle09, CPS13, SF10]). Their safety-
critical nature and inherent characteristic of being implemented using low-level hardware fea-
tures make them a target for pervasive formal verification [Moo03, APST10]. In pervasive for-
mal verification, the proof of correctness is given by establishing a model stack from the lowest
level (e.g. gate-level hardware construction) to the highest level of abstraction (e.g. application-
level C semantics). Between neighboring models of the model stack, simulation theorems pow-
erful enough for the properties transfer, and revealing required conditions and subtleties are
established, resulting in a pervasive theory that allows to show correctness of realistic systems.
In order to achieve the goal of the formal verification in case of operating systems and hyper-
visors, one has to obtain and use for their implementation a sound formal semantics integrating
high-level programming language execution with the effects of invoking exposed hardware fea-
tures at an adequate level of abstraction. Based on that, the correctness of the system implemen-
tation can be established by a simulation with an appropriate abstract model (or specification).
However, defining such semantics for industrial-like systems is considered to be an absolutely
non-trivial task in the presence of thousands pages (e.g., AMD64 Architecture Programmer’s
Manual [Adv11a, Adv11b]) of informal specification with non-exhaustive description of sys-
tem functionality, especially when the parallelism is intensively exploited.
One of the fundamental programming concepts enhancing such a parallelism on single- and
multi-core architectures is splitting the execution of a process into threads, performing separate
tasks within a process [BC05, Lov10, Han96, But97]. Since threads share the same address space,
other resources, and have only distinct local stacks, they are considered to be light-weight in
comparison to processes and require much less overhead during scheduling operations as well
as do not need specific inter-process communication. Moreover, threads can take advantages
of multi-core hardware by being assigned to different processor cores on which they can be
executed.1 By supporting the thread migration via associating the same thread with multiple
processors, an operating system kernel can effectively schedule it with respect to the current
workload in the system.
In this doctoral thesis we concentrate mostly on two topics crucial for the aforementioned
verification field: sound concurrent programming semantics for the implementation and veri-
fication of hypervisor and operating system kernels on an industrial-like multi-core processor
architecture, and the extension of this semantics with threads allowing to use the whole power
of the multi-threading programming paradigm on the kernel level.
1“Processor Affinity is the term used for describing the rules for associating certain threads and certain proces-
sors.” [TMu03]
1
1 Introduction
# 1
Shared Memory
Shared Abstract Resources
...
Active Thread
...
Sleeping Threads
# np
Sleeping Threads
# 2
Sleeping Threads
Processor #1 Processor #2 ... Processor #np 
M
ul
ti-
C
or
e 
x8
6-
64
Im
pl
em
en
ta
tio
n 
of
 H
yp
er
vi
so
r K
er
ne
l
Processor
 #1.1 
Processor
 #1.2 
...
Memory System
Processor
 #1.N1 
Shared Memory
Virtual Machine 1
Processor
 #k.1 
Processor
 #k.2 
... Processor
 #k.Nk 
Shared Memory
Virtual Machine k
...
Si
m
ul
at
ed
 V
irt
ua
l M
ac
hi
ne
s
Active Thread Active Thread
... ...
Figure 1.1: Bird’s view on hypervisor kernel implemented with threads. Dashed lines show
the association between virtual processors and active threads (above), and between the active
threads and physical processors executing them (below). The curved arrows depict simultane-
ous accesses to shared resources.
Contribution and Related Works
The work covered in this thesis was inspired by the Verisoft XT project [The11] aimed at the
extension of the pervasive theory developed in the earlier Verisoft project [Ver10] to the indus-
trial multi-core processor system. The project had as one of its goals the formal correctness of
Microsoft Hyper-V Hypervisor [LS09]. The kernel layer of this hypervisor consists of threads
associated with virtual processors of guests (Figure 1.1). Such virtual processors are scheduled
for running on the physical ones by a simple switch of corresponding threads.
The correctness proof of Hyper-V was conducted with the help of VCC [Mic16, CDH+09], a
verification tool for concurrent C from Microsoft Research. Though quite a substantial piece of
the hypervisor’s code was covered, due to the need to redevelop the pervasive theory for the
multi-core systems for arguing about the soundness of the VCC application, the research in a
similar direction was continued at the chair of Prof. Dr. W. J. Paul after the end of the project.
The work there is basically devoted to the correctness of an academic version of a hypervi-
2
1.1 Motivation and Overview
sor and sound methods for showing it, however, for a simpler MIPS multi-core architecture
introduced by Schmaltz [Sch13] with less hardware virtualization features in comparison to the
comprehensive x86 model [Deg11] used in Verisoft XT. A survey on the overall theory of multi-
core hypervisor verification and a clear road map for the research on this topic were given by
Paul, Cohen, and Schmaltz in [CPS13].
Since the x86 and MIPS considered within the Verisoft XT and the research at the chair have
rather a relaxed TSO-like (total store ordering) [Owe10] memory model than the sequential
consistent memory [Lam79] mostly used by various verification techniques for concurrent pro-
grams (e.g., see [App11, CMST10]), Cohern and Schirmer in [CS10] introduced a store buffer
flush policy and an ownership discipline guaranteeing the sequential consistency on the TSO.
Their results were later extended by Kovalev and Geng Cheng for the case where the memory-
management steps are involved [CCK14]. In turn, Oberhauser [Obe16] improved this program-
ming discipline in order to obtain a simpler reduction theorem for x86-TSO.
Since the code of Hyper-V is written in C/assembler, Schmaltz in [Sch13] developed an op-
erational semantics for the C Intermediate Language (C-IL) close to C widely used for system
programming. Moreover, she enriched this abstract model with ghost components needed for
arguing about VCC soundness. Based on these results, Shadrin in [Sha12, SS12] introduced
the integrated high- and low-level programming semantics allowing to perform inter-language
calls and returns between C-IL and Macro Assembly. Moreover, he sketched the correctness
of an optimizing compiler for such a mixed machine and used it for completing the formal
verification of a small hypervisor [AHPP10] on a simplified version of the single-core VAMP
architecture [BJK+06]. The semantics and correctness of its implementation was mostly based
on the communicating virtual machines (CVM) [GHLP05, IdRT08, Tsy09], a model for a generic
operating system kernel communicating with user processes. The CVM for a single-core MIPS
architecture and its correctness proof were recently covered in detail in [PBLS15].
In order to apply the compiler correctness on multi-core hardware architecture and be able
to specify steps of any more abstract model (e.g., hypervisor and OS kernels) of the perva-
sive verification stack, Baumann [Bau14b] introduced the ownership-base order reduction and
formalized a general simulation theory for concurrent systems with shared memories. More-
over, he adapted separately the C-IL and Macro Assembly semantics, redefined the sequential
compiler correctness in both cases for the MIPS architecture, and showed how one can justify
their concurrent models. Using these results for C-IL and applying them for a restricted x86
from [Deg11], Kovalev managed to show the TLB virtualization in the context of hypervisor
verification, however, by abstracting away the details of the compiler correctness in the concur-
rent setting. In his work he also assumed the hypervisor kernel construction with threads from
Figure 1.1 but did not consider their formal model and correctness.
In the scope of the present doctoral thesis we close this gap and by combining the aforemen-
tioned results together and apply the overall theory in detail up to the level of hypervisor/OS
kernel implementation on the multi-core MIPS architecture. Along with the detailed revision of
the integrated semantics, its compiler correctness and all requirement and software conditions
needed for the successful simulation, we extend the model with operations crucial for system
programming in the presence of processes/guests steps. Using the obtained sound semantics
based on the compiler correctness we formalize a model of kernel threads and answer questions
that arose at the beginning of the work covered here: (i) what is the semantics for the kernel im-
plementation with threads, and (ii) how one can implement the stack-based switch of threads
using the optimizing C with assembler so that its correctness can be easily guaranteed.
To the best of my knowledge, though there exist quite a few works (e.g., [MPR07, FS05,
FSV+06, FSDG08, NYS07, GFSS12]) on the verification of multi-threaded concurrent programs,
they either almost exclusively use high-level calculi, or consider threads and operations on
them only on the level of assembly programming for simpler processor architectures. There-
3
1 Introduction
fore, one can see our results here as the first attempt to argues about the correctness of thread
implementation relying on properties of a high-level programming language compiler for an
industrial-like multi-core architecture.
Thesis Outline
This doctoral thesis is organized in 9 chapters.
• In the reminder of Chapter 1 we give the basic mathematical notation used throughout
the whole work.
• Chapter 2 is devoted to the general model for concurrent system and simulation allowing
us to build the overall model stack considered in this thesis. All layers of this stack will
be formalized with respect to this theory and, therefore, have a similar structure.
• In Chapter 3 we present the computational model of the multi-core MIPS-86 on which the
higher layers of our model stack are implemented.
• In Chapter 4 we argue about a simple store buffer reduction for the MIPS-86 that can be
shown with the help of the theory from Chapter 2.
• Chapter 5 provides the definition of the concurrent mixed machine semantics integrating
C-IL and Macro Assembly for the MIPS-86 model.
• In Chapter 6 the compiler correctness for the mixed semantics from Chapter 5 is consid-
ered in detail so that it can be used in the rest of the work. Moreover, based on the results
from Chapter 2 we argue about the soundness of the concurrent mixed machine.
• In Chapter 7 we extend the semantics of the mixed machine with inline assembly execu-
tion which also allows to integrate steps of guests and user processes into the model.
• Using the results from the previous chapters, in Chapter 8 we finally introduce the model
of kernel threads, give their implementation, and discuss the correctness criteria and proof
in the concurrent setting.
• In the last Chapter 9 we shortly summarize the results and discuss directions for future
work.
1.2 Basic Notation
In the scope of the work we mostly use the notation and basic mathematical concepts borrowed
from the book [KMP14] and the doctoral thesises [Bau14b, Sch13].
Definitions and Shorthands
In order to define new types, predicates, and functions mathematically, we use the identity
symbol≡marked with def . For instance, one could define the predicate P (x, y) in the following
way:
P (x, y)
def≡ Q(x, y) ∧R(x)
Each important named definition is stated in a numbered environment with a name of a newly
defined notion.
4
1.2 Basic Notation
Moreover, for brevity we also allow to introduce shorthands used locally in a certain con-
text or throughout the whole work. We mark such shorthands with the identity symbol. For
example, one could introduce the shorthand
px ≡ P (x, y)
When a formula contains a conjunction we allow to omit the symbol ∧ between the conjuncts
and represent them as a numbered list, e.g. we could also define P (x, y) as
P (x, y)
def≡ (i) Q(x, y)
(ii) R(x)
In case of a definition written in such a form, we then can refer to a certain item by its number
in the list of conjuncts, e.g., P (x, y).(i).
Numbers and Sets
In the work we use the following notation for the standard mathematical sets:
• N def≡ {1, 2, ...} – the set on natural numbers excluding zero,
• N0 def≡ N ∪ {0} – the set on natural numbers including zero,
• Nn def≡ {1, 2, ..., n} – the set of natural numbers in the range from 1 to a non-zero n,
• Z def≡ {...,−2,−1, 0, 1, 2, ...} – the set of integers,
• [i : j] def≡ {i, i + 1, ..., j} – the interval of integers from i to j if i ≤ j and empty set
otherwise,
• B def≡ {0, 1} – the set of Boolean values. In order to refer a Boolean value we also use the
well-known term bit.
We denote the cardinality of a finite set A as #A. In order to select an element from any set
A we use the Hilbert choice operator A. Particularly, for a singleton set it returns a unique
element {x} = x.
Records
Definition 1.1 (Record Notation for Tuples). Let a set A be a Cartesian product of sets A1,A2,
. . . ,Ak that represents a set of tuples c ∈ A such that
c = (a1, a2, . . . , ak)
Given names (labels) n1, n2, . . . , nk for all individual elements of tuples in A such that ni is
associated with an element ai we talk about A as a set of labeled tuples or records and use c.ni to
refer to the i-th record field equal to ai.
Instead of the explicit definition of A as
A
def≡ A1 × A2 × . . .× Ak with c = (c.n1, c.n2, . . . , c.nk)
5
1 Introduction
we intend to use the shorter form
A
def≡ (n1 ∈ A1, n2 ∈ A2, . . . , nk ∈ Ak)
Definition 1.2 (Record Update). For updating a field c.ni of the record c ∈ A from Definition 1.1
with a new value v ∈ Ai we introduce the notation c[ni := v] such that
c[ni := v]
def≡ c′
where
∀j ≤ k. c′.nj =
{
v : j = i
c.nj : otherwise
Functions
In the work along with total functions f : X→ Y, we operate with partial functions g : X⇀ Y.
Obviously, the type of such a function g can be also defined as total mapping g : X′ → Y for
some X′ ⊂ X. To denote the domain of a given function we use the standard mathematical
notation dom (f) = X and dom (g) = X′.
Definition 1.3 (Function Domain Restriction). For a given function f : X → Y or f : X ⇀ Y,
and a set X ⊆ dom (f) we can obtain the restriction f |X : X → Y of the function f so that for all
x ∈ X we have
f |X(x) def≡ f(x)
Definition 1.4 (Function Update). For any such function f we can redefine (or update) its map-
ping from a given element a ∈ dom (f) into a new value v ∈ Y with the help of the notation
f [a 7→ v] such that for any x ∈ dom (f) one obtains
f [a 7→ v](x) def≡
{
v : x = a
f(x) : otherwise
Definition 1.5 (Union of Functions). For any given functions f : X → Y and g : U → V with
disjoint domains X ∩ U = ∅we define
(f unionmulti g) : X ∪ U→ Y ∪ V
such that for all x ∈ X ∪ U we have
(f unionmulti g) (x) def≡
{
f(x) : x ∈ X
g(x) : otherwise
Note, that the same notation can be easily applied to partial functions if we know their domains.
Sometimes in the work, in order to match formal definitions we will need to transform partial
functions into total ones.
Definition 1.6 (Transformation of Partial Function into Total). We can transform a partial func-
tion f : X⇀ Y into a total one dfe : X→ Y in such a way that for any x ∈ X one computes
dfe(x) def≡
{
f(x) : x ∈ dom (f)
Y : otherwise
Analogously for total mappings f : X′ → Y such that one has X′ ⊂ X we can apply
dfeX
def≡ dfe
6
1.2 Basic Notation
Obviously, we will use such transformations only for cases where we are not interested in the
added dummy function values.
Sequences and Bit-Strings
Definition 1.7 (Finite Sequences). A finite sequence (or string) a of many n ∈ N elements a[i] or
ai from a set A is represented as
a = (a1, . . . , an) = a[1 : n]
and formalized as a mapping from the indices of the elements to the elements
a : Nn → A
The set An of sequences of length n with elements from A is then defined as
An
def≡ {a | a : Nn → A}
Definition 1.8 (Concatenation of Sequences). For two finite sequences a ∈ An, b ∈ Am with
n,m ∈ N, the concatenation a ◦ b results in a sequence of length n+m and is defined as
a[1 : n] ◦ b[1 : m] def≡ c[1 : n+m]
such that for all i ∈ Nn+m
c[i] =
{
a[i] : i ≤ n
b[i− n] : otherwise
Definition 1.9 (Empty Sequence). The empty sequence ε is a unique sequence of length 0 and is
the only element of the set
A0
def≡ {ε}
The concatenation of the empty sequence with a sequence a ∈ An for any n ∈ N0 satisfies
a ◦ ε = ε ◦ a = a
Definition 1.10 (Sets of Finite Sequences). The set A+ of non-empty finite sequences of elements
from a set A is defined as
A+
def≡
⋃
n∈N
An
In turn, the set A∗ of all finite sequences of elements from A is
A∗
def≡ A+ ∪ {ε}
To denote the length of a sequence a ∈ A∗ we use the standard notation |a|. Moreover, in
order to indicate that an element x ∈ A belongs to the given sequence a we overload the set
membership symbol:
x ∈ a def≡ a 6= ε ∧ ∃i ∈ N|a|. a[i] = x
Definition 1.11 (Subsequences). For a finite sequence a ∈ An, and indices i, j ∈ Nn, i ≤ j, we
form the subsequnce a[i : j] in the following way:
a[i : j]
def≡ c[1 : j − i+ 1] with c[k] = a[i+ k − 1] for all k ∈ Nj−i+1
Additionally, for indices i > j we simply set a[i : j]
def≡ ε.
7
1 Introduction
Definition 1.12 (Reverse of a Finite Sequence). In order to reverse a finite sequence xs ∈ X∗
we use the function rev(xs) ∈ X∗ computing the result as
rev(xs)
def≡
{
x ◦ rev(xs′) : xs = xs′ ◦ x
ε : xs = ε
Definition 1.13 (Application of a Function to Elements of a Finite Sequence). Additionally,
we introduce the function map : (X→ Y)×Xn → Yn that applies a given function f : X→ Y to
each element of a finite sequence xs ∈ Xn of length n ∈ N such that every element with index
i ∈ Nn of the resulting sequence is computed as
map(f, xs)[i]
def≡ f(xs[i])
Along with the finite sequences with elements numbered from left to right and staring from
1 we will also use sequences of bits where the elements have a different numbering used in
number representations.
Definition 1.14 (Bit-Strings). A bit-string (or bit-vector) a : [0 : n − 1] → B containing elements
ai or a[i] from the set B has the format
a = (an−1, . . . , a0) = a[n− 1 : 0]
Moreover, by Bn we denote the set of all bit-strings of length n ∈ N:
Bn
def≡ {a | a : [0 : n− 1]→ B}
Due to the different representation, additionally, we need to overload the notations of the
concatenation and the subsequences for the bit-strings.
Definition 1.15 (Concatenation of Bit-Strings). For two bit-strings a ∈ Bn, b ∈ Bm with n,m ∈
N, the concatenation a ◦ b is defined as
a[n− 1 : 0] ◦ b[m− 1 : 0] def≡ c[n+m− 1 : 0]
such that for all i ∈ [0 : n+m− 1]
c[i] =
{
b[i] : i ≤ m− 1
a[i−m] : otherwise
Definition 1.16 (Bit-Substrings). For given n ∈ N, i, j ∈ [0 : n−1], j ≥ i, and a bit string a ∈ Bn,
we define a bit substring a[j : i] of a as
a[j : i]
def≡ c[j − i : 0] with c[k] = a[i+ k] for all k ∈ [0 : j − i]
In the work we will deal with 32-bit processor architecture and refer to a 32-bit string a ∈ B32
as a word. A bit sting of length 8 is called a byte. Consequentially, every word consists of 4 bytes.
Definition 1.17 (Byte Extraction). Let n = 8 · k be a multiple of k ∈ N and a ∈ Bn is a bit-string
consisting of k bytes. For i ∈ [0 : k − 1] we define the byte i of the string a as
byte(i, a)
def≡ a[8 · (i+ 1)− 1 : 8 · i]
8
1.2 Basic Notation
Definition 1.18 (Repeating Bits). For a bit x ∈ B and a number n ∈ N we denote a bit-string
containing n copies of x by xn. Formally, it is defined inductively as
x1
def≡ x
xn+1
def≡ x ◦ xn
Note that for brevity in this work we allow to omit the symbol ◦ for the concatenation of
bit-strings and finite sequences, e.g., x ◦ xn = xxn.
Sometimes, it is more convenient to consider strings of bytes instead of bit-strings.
Definition 1.19 (Byte-Strings). A byte-string a : [0 : n − 1] → B8 has the same format as a bit-
string (indexing from right to left), but contains elements ai or a[i] from the set B8. Naturally,
by (B8)n we denote the set of all byte-strings of length n ∈ N:
(B8)n
def≡ {a | a : [0 : n− 1]→ B8}
Obviously, the definitions for the set (B8)∗, concatenation of byte-strings, and byte-substrings
are similar to the ones stated above, and we do not repeat them here for brevity. Additionally,
we introduce two functions converting bit-strings into byte-strings and vice versa.
Definition 1.20 (Bit-Strings and Byte-Strings Conversion). For k ∈ N, a bit-string x ∈ B8·k,
and a byte string y ∈ (B8)k, the functions
bits2bytes(x)
def≡ y and bytes2bits(y) def≡ x
perform the conversion between bit-strings and byte-strings in a way such that for all indices
i ∈ [0 : k − 1] and j ∈ [0 : 7] we have
y[i][j] = x[8 · i+ j]
Binary Arithmetic
Definition 1.21 (Binary Numbers). For bit-strings a ∈ Bn of length n ∈ N we denote by
〈a〉 def≡
n−1∑
i=0
ai · 2i
the interpretation of a as a binary number. We call a the binary representation of length n of the
natural number 〈a〉 ∈ N0. In turn, the set of numbers with binary representation of length n is
defined as
Bn
def≡ {〈a〉 | a ∈ Bn}
Obviously, it is easy to show (see [KMP14]) that we have
Bn = [0 : 2
n − 1]
Definition 1.22 (Binary Representation of a Number). For a number x ∈ Bn we obtain its
binary representation binn(x) of length n ∈ N as
binn(x)
def≡  {a | a ∈ Bn ∧ 〈a〉 = x}
For a shorter notation we tend to use xn instead of binn(x):
xn
def≡ binn(x)
9
1 Introduction
Definition 1.23 (Binary Addition and Subtraction). The operations of the binary addition +n
and subtraction −n on two a, b ∈ Bn are defined as follows:
a+n b
def≡ binn(〈a〉+ 〈b〉 mod 2n)
a−n b def≡ binn(〈a〉 − 〈b〉 mod 2n)
Definition 1.24 (Set of Words Starting from a Given One). For d ∈ N words starting from a
word a ∈ B32 we easily define the set
{a}d def≡ {a+32 i32 | i ∈ [0 : d− 1]}
Definition 1.25 (Two’s Complement Numbers). For bit-strings a ∈ Bn of length n ∈ N we
denote by
[a]
def≡ −an−1 · 2n−1 + 〈a[n− 2 : 0]〉
the interpretation of a as a two’s complement number and refer to a as the two’s complement repre-
sentation of [a]. The set of integer numbers with two’s complement representation of length n is
denoted as
Tn
def≡ {[a] | a ∈ Bn}
Obviously, it is equal to the following integer interval:
Tn =
[−2n−1 : 2n−1 − 1]
One can easily notice [a] < 0⇐⇒ an−1. Therefore, an−1 is called a sign bit.
Definition 1.26 (Two’s Complement Representation of a Number). The two’s complement
representation of an integer number x ∈ Tn is obtained as
twocn(x)
def≡  {a | a ∈ Bn ∧ [a] = x}
Definition 1.27 (Zero- and Sign-Extensions of Bit-Strings). For a ∈ Bn, and n, k ∈ Bn such
that k > n, we define zero-extended and sign-extended bit-strings zxtk(a) and sxtk(a) respectively
as follows:
zxtk(a)
def≡ 0k−na
sxtk(a)
def≡ ak−nn−1a
Definition 1.28 (Bit-Operations on Bit-Strings). For bit-string a, b ∈ Bn of length n ∈ N, a bit
c ∈ B, and operations • ∈ {∧,∨,⊕} we define the corresponding operations on bit-strings as
follows:
a •n b def≡ (an−1 • bn−1, . . . , a0 • b0)
c •n b def≡ (c • bn−1, . . . , c • b0)
¬na def≡ (¬an−1, . . . ,¬a0)
10
1.2 Basic Notation
Computations
All models considered in this work are represented as deterministic automata which transition
relations δ are described by transition functions
δ : C× Σ ⇀ C
where C is a set of states (or configurations), and Σ is a set of inputs also called input alphabet.
This means that for modeling non-deterministic systems we formalize automata in such a
way that their transition relations are made deterministic by resolving any non-deterministic
choice with the help of the input alphabet.
Definition 1.29 (Multiple Transitions). Given sequences of configurations c ∈ Cn+1 and inputs
in ∈ Σn with n ∈ N0, a number of steps k ≤ n, the notation c1 −→kδ,in ck+1 denotes that
c[1 : k + 1] is obtained by applying the transition function δ for k steps with corresponding
inputs from in. Formally, we define this as
c1 −→kδ,in ck+1
def≡
{
c1 −→k−1δ,in ck ∧ ck+1 = δ(ck, ink) : k > 0
1 : k = 0
Generally, for two configurations s, s′ ∈ C, inputs in ∈ Σn we denote that s′ is reached in n ∈ N0
steps from s in the following way:
s −→nδ,in s′
def≡ ∃a ∈ Cn+1. a1 −→nδ,in an+1 ∧ s = a1 ∧ s′ = an+1
As special cases, one can also define the same notation for a one step computation with an input
in ∈ Σ
s −→δ,in s′ def≡ s −→1δ,in s′
and arbitrarily long computation with inputs in ∈ Σ∗
s −→∗δ,in s′
def≡ s −→|in|δ,in s′
11

2
Model for Concurrent
Systems and Simulation
The pervasive verification approach of complex computer systems requires building stacks of
semantics from the hardware layer up to semantics of abstract models which implementations
rely on the correctness of all underlying levels including assemblers, compilers, etc. In such a
stack adjacent levels are connected by simulation theorems allowing to discover software condi-
tions under which the simulation of the top most abstract model on the lowest implementation
layer can be shown.
However, for models proven to be correct only for their sequential execution we often cannot
directly use their correctness in the concurrent setting. As an example, one could consider
a compiler of a high-level programming language verified for the sequential execution of a
compiled code. Therefore, when this code runs on a multi-core machine, where the processors
steps are interleaved arbitrary, one has to justify the concurrent semantics of the same language
with the atomic execution of its statements.
For this justification and establishing concurrent semantics stacks Christoph Baumann in his
doctoral thesis [Bau14b] based on the earlier developed theory in [LS89, CL98] suggested to
employ sequential simulations theorems for blocks of consecutive processor steps. Such blocks
starting in configurations where the simulation can hold (so called consistency points) are de-
rived via reordering from arbitrarily interleaved transitions under a condition that an applied
programming discipline guarantees the absence of memory races. Moreover, the step permu-
tation must preserve the order of shared memory accesses (I/O-steps) as well as local steps for
all units present in a model (see Figure 2.1 as an example).
Since in our work we intensively use Baumann’s theory of order reduction and concurrent
simulation, this chapter is fully devoted to its details. Though most of the definitions, lemmas
and theorems are extracted from [Bau14b], we also introduce a few extensions and corrections
that were not taken into account in the original work and which will allow us to apply the
theory for our topic. Therefore, for additional or substantially modified claims requiring more
argumentation we provide our detailed proofs here.
2.1 Abstract Model with Memory and Ownership
In order to model concurrent systems we introduce a generic model including a fixed number of
execution units accessing a shared memory. According to [Bau14b] the model got a name Cos-
mos as an acronym for a Concurrent system with shared memory and ownership. The Cosmos
model is based on memory and unit configurations with their transition functions, additional
predicates characterizing unit steps, and a ghost ownership state allowing to abstract a pro-
grammer’s policy guaranteeing safe memory accesses.
13
2 Model for Concurrent Systems and Simulation
si
m
1,
2,
3
1
2
3
1
2
3
1
2
3
si
m
1,
2,
3
si
m
1,
2,
3
si
m
1,
2 si
m
1,
2
si
m
1,
2
arbitrary
interleaving
reordered
computation
simulated
computation
Figure 2.1: Application of order reduction and simulation for a concurrent model with three
units. Grey boxes depict I/O-steps whereas empty ones are local. One direction arrows show
the permutation of steps for reordering. The reordered interleaving is performed at units’ con-
sistency points marked as black dots where the simulation relation sim holds for units present
in the subscript. Note, that Unit 3 has not reached its consistency point and cannot be covered
by sim, though the changes of the shared memory must be reflected on the abstract level. Op-
positely, the steps of Unit 2 have changed only its local configuration and are not considered in
the abstraction yet (dotted box).
2.1.1 Signature and Instantiation Parameters
The Cosmos model is characterized by a set of instantiable parameters (types and functions)
which we call a Cosmos model signature.
Definition 2.1 (Cosmos Model Signature). A Cosmos model S ∈ S is given by a tuple
S
def≡ (A,V,R,nu,U , E , reads, δ, IP, IO,OT )
with the following named components:
• A,V – sets of memory addresses and memory values. In the underlying machine instan-
tiating the Cosmos model one usually considers the memory as a mapping m : A → V .
• R ⊆ A – a fixed set of read-only addresses, that could include, for instance, a non-
modified memory region of a compiled code, constants, etc.
• nu – a number of computation units in the machine.
• U – a set of computation unit states defined in the underlying semantics. For simplicity
we assume that all units are instantiated with the same automation.
• E – a set of external inputs for the units. Though the units communicate via the shared
memory, the external inputs are often used to model the communication with an external
environment as well as to resolve unit’s non-deterministic behaviour.
14
2.1 Abstract Model with Memory and Ownership
• reads : U × (A → V) × E → 2A – a set of memory addresses read during a step for given
unit and memory configurations, and an external input. In the work we call it a reads-set.
• δ : U × (A ⇀ V) × E ⇀ U × (A ⇀ V) – a unit transition function taking as inputs a unit
state, a partial memory, and external inputs. The result of the defined transition is a unit
configuration and a part of the memory after the step.
The partial memory in the function inputs is restricted to the reads-set including all mem-
ory addresses needed for the step. The output partial memory represents the updated
portion of the memory configuration. In contrast to [Bau14b], the transition function is
supposed to be a partial mapping, what is more typical for many semantics and disam-
biguates the further mathematical formalism in the work.
• IP : U × (R → V) × E → B – a predicate specifying desired interleaving points (IP-
points) on the base of a unit configuration, an external input, and the read-only memory.
When the given unit is in the interleaving point and makes a step we allow computations
performed by other units to appear (or interleave) before it.
• IO : U × (R → V) × E → B – a predicate denoting whether a unit step from a given
configuration and determined by an external input is suitable for an IO-operation. The
IO-operations are used for interactions with the environment and other units and could
be, e.g., atomic accesses to the shared portion of memory. The configuration before such a
unit step is called an IO-point.
• OT : U × (R → V) × E → B – indicates whether a given IO-point can be used for the
ownership transfer.
The predicate OT was not introduced in the theory developed by Baumann. However, the
application of the theory for different models showed that such a flag allows to restrict the
ownership policy for cases where the ownership transfer is undesirable (e.g. non-deterministic
processor component steps that cannot be influenced by the programmer; guest steps with its
unknown for the hypevisor memory access policy, etc.) and makes the model cleaner without
introducing additional restrictions on the safety policy and ownership annotation later on.
In the work we also adopt all definitions with respect to OT . Since Baumann considered the
order reduction theory with the ownership transfer at any IO-points, it can be treated as a more
general case. The presence of our additional flag does not influence substantially any proofs of
the order reduction in [Bau14b] and requires only simple additional bookkeeping not changing
the way of argumentation. Therefore, for such theorems and lemmas from [Bau14b] we do not
repeat their proofs in this thesis.
For any Cosmos model S ∈ S we will use in the work the shorthands A,V,R, etc. to refer to
the corresponding components of its signature when it is clear from the context which machine
we deal with.
For a meaningful model instantiation one has to guarantee that the reads-set includes all
relevant addresses to determine which addresses are read in one unit step.
Definition 2.2 (Instantiation Restriction for reads). By the predicate instar we require that
the reads-set contains all addresses upon whose memory contents it depends. For any Cosmos
machine S let u ∈ U be a computation unit state, m,m′ ∈ (A → V) memory configurations,
and in ∈ E be an input for a step of the unit. If the memory contents agree on reads-set R =
S.reads(u,m, in), then also the reads-set wrt. m′ agrees with R.
instar(S)
def≡ (m′|R = m|R =⇒ S.reads(u,m′, in) = R )
15
2 Model for Concurrent Systems and Simulation
Moreover, as states above, not all IO-points can be used for the ownership transfer. The flag
OT must be instantiated only for a subset of IO-operations.
Definition 2.3 (Instantiation Restriction for OT ). For a Cosmos machine S let u ∈ U be a com-
putation unit state, m ∈ (R → V) read-only memory, and in ∈ E be an input for a step of the
unit. Then we require that any OT is also an IO-point.
instaot(S)
def≡ (OT (u,m, in) =⇒ IO(u,m, in))
Note, that in the thesis we will provide instantiations of the Cosmos model with different
levels of semantics in the verification stack. However, we will rely on the correctness of the
instantiations without proving the aforementioned restriction properties. The detailed proof of
the property instar for some of the models used in this work can be found in [Bau14b].
2.1.2 Cosmos Machine Configuration and Semantics
Now we define a Cosmos machine configuration consisting of a concurrent machine state and an
ownership state. The concurrent machine state represents a machine instantiating the Cosmos
model in the concurrent setting. The ownership state is a ghost state which does not influence
the underlying semantics of the concurrent model and is used for modelling the memory access
policy for the further justification of the model with a few execution units.
Definition 2.4 (Concurrent Machine State). The concurrent machine state for a Cosmos model
S ∈ S is a pair
MS
def≡ (u : Nnu → U ,m : A → V),
where u maps unit indices to their configurations and m is the state of the memory.
Definition 2.5 (Ownership State). The ownership (ghost) state for a Cosmos model S ∈ S is a pair
GS
def≡ (O : Nnu → 2A,S ∈ 2A) ,
where O maps unit indices to the corresponding units’ sets of owned addresses (own-sets) and
S is a set of shared writable addresses.
Definition 2.6 (Cosmos Machine Configuration). A Cosmos machine configuration for a Cosmos
model S ∈ S is given as a pair
CS
def≡ (M ∈MS ,G ∈ GS)
For a configuration C ∈ CS of a Cosmos model S ∈ S and a unit index p ∈ Nnu we use the
following shorthands:
C.up ≡ C.M.u(p) C.m ≡ C.M.m
C.Op ≡ C.G.O(p) C.S ≡ C.G.S
readsp(C, in) ≡ readsp(C.M, in) ≡ reads(C.M.u(p), C.M.m, in)
IOp(C, in) ≡ IOp(C.M, in) ≡ IO(C.M.u(p), C.M.m|R, in)
OT p(C, in) ≡ OT p(C.M, in) ≡ OT (C.M.u(p), C.M.m|R, in)
IPp(C, in) ≡ IPp(C.M, in) ≡ IP(C.M.u(p), C.M.m|R, in)
16
2.1 Abstract Model with Memory and Ownership
Definition 2.7 (Writes-set for a Machine Step). For a given Cosmos model S with a machine
configuration M ∈MS and an input in ∈ E we can determine the set of written addresses in the
corresponding step of a unit p. This so-called writes-set is obtained with the following function.
writesp(M, in)
def≡ dom (m′) , where (u′,m′) = δ (M.u(p),M.m|readsp(M,in), in)
For a whole Cosmos model configuration C ∈ CS we reload the shorthand
writesp(C, in) ≡ writesp(C.M, in)
The transition of the Cosmos machine involves transitions on the concurrent machine and the
ownership state. Before we define the overall Cosmos machine transition function we provide
separate semantics of its configuration components.
The Cosmos machine units execution is performed by their transition function provided by
the model instantiation. A scheduling input shows which unit makes a step.
Definition 2.8 (Concurrent Machine Transition Function). A transition step of the machine for
a Cosmos model S ∈ S is defined by the transition function
∆t : MS × Nnu × E ⇀MS , s.t. ∆t(M,p, in) def≡ M ′
which takes a configuration M , a scheduling input p, an external input in, performs a step of
the unit p on its state and the memory by the instantiated δ of the Cosmos model, and produces
the result configuration
M ′ =
{
(M.u[p 7→ u′],munchanged unionmultim′) : c′ = (u′,m′)
undefined : otherwise
where c′ = δ
(
M.u(p),M.m|readsp(M,in), in
)
and munchanged = M.m|A\dom(m′).
A few additional inputs serve the ownership changes and are provided by the verification
engineer annotating the system execution (e.g. based on a program code).
Definition 2.9 (Ownership Transfer Information). For a transition step of a Cosmos machine
with a signature S ∈ S we define the ownership transfer information
ΩS
def≡ (Acq ∈ 2A,Loc ∈ 2A,Rel ∈ 2A)
that includes sets of acquired addresses Acq , acquired local addresses Loc (which should be a
subset of Acq), and released addresses Rel .
Then the ownership transfer function for the ownership component of the Cosmos model can
be defined as follows.
Definition 2.10 (Ownership Transfer Function). A transition step of the ownership component
of S ∈ S is defined by a function
∆o : GS × Nnu × ΩS → GS ,
∆o(G, p, o) def≡ (G.O[p 7→ O′],S ′) ,
where O′ = (G.O(p) \ o.Rel) ∪ o.Acq and S ′ = (G.S \ o.Loc) ∪ o.Rel
17
2 Model for Concurrent Systems and Simulation
Therefore, the Cosmos machine semantics can now be defined on the transitions of the sepa-
rate machine components.
Definition 2.11 (Cosmos Machine Transition Function). For a Cosmos model S ∈ S we define
the transition function
∆ : CS × Nnu × E × ΩS ⇀ CS ,
∆ (C, p, in, o)
def≡
{
(M ′,∆o(C.G, p, o)) : ∆t(C.M, p, in) = M ′
undefined : otherwise
2.1.3 Computations and Step Sequence Notation
Recall that we split the Cosmos machine configuration into the concurrent machine state and
the ghost ownership state. To distinguish transitions of both components we introduced the
corresponding transition functions. To argue about reordering of transitions we rely on the
execution steps of the concurrent machine (as well as the whole Cosmos machine) from a certain
alphabet rather than on the resulting system states. In our case the alphabet contains transition
information and ownership transfer information. The latter was already defines as ΩS , and we
now switch to the step information needed for the concurrent machine transitions as well as a
combination of both to denote the Cosmos machine steps.
Definition 2.12 (Concurrent Machine Step Information). The set ΘS of step information de-
scribes a concurrent machine step of Cosmos model S ∈ S
ΘS
def≡ (s ∈ Nnu, in ∈ E , io ∈ B, ot ∈ B, ip ∈ B)
by the scheduling parameter s, the external input in for the step, and the flags io, ot, and ip
characterizing the step as an IO-point with the possible ownership transfer if ot is set, and as
an interleaving point of the reordered computation correspondingly.
A sequence θ ∈ Θ∗S of machine steps we call a machine step (or transition) sequence. Analo-
gously, o ∈ Ω∗S is an ownership transfer sequence.
Definition 2.13 (Cosmos Machine Step Information). A step of the Cosmos machine contains
the concurrent machine step information and the ownership transfer information:
ΣS
def≡ (t ∈ ΘS , o ∈ ΩS)
For any step information α ∈ ΣS we allow to use shorthands such that for x ∈ {s, in, io, ot, ip}
and y ∈ {Acq ,Loc,Rel} one can write α.x ≡ α.t.x and α.y ≡ α.o.y. A sequence σ ∈ Σ∗S
is a Cosmos machine step sequence. We also introduce shorthands mapping sequences of step
information σ to transition and ownership transfer sequences.
σ.t ≡ σ1.t · · ·σ|σ|.t σ.o ≡ σ1.o · · ·σ|σ|.o
Definition 2.14 (Step Notation). The notation M t7→ M ′ denotes that the transition t ∈ ΘS is
executed from the machine state M ∈ MS and results in M ′ ∈ MS . Additionally t.io, t.ot, and
t.ip correspond to the values of the predicates IO OT , and IP respectively.
M
t7→M ′ def≡ M ′ = ∆t(M, t.s, t.in) ∧ IPt.s(M, t.in) = t.ip ∧
IOt.s(M, t.in) = t.io ∧ OT t.s(M, t.in) = t.ot
18
2.1 Abstract Model with Memory and Ownership
For steps α ∈ ΣS , which include ownership transfer information, we define a similar notation
for the Cosmos machine transition from configuration C ∈ CS to C ′ ∈ CS .
C
α7→ C ′ def≡ C.M α.t7→ C ′.M ∧ C ′.G = ∆o(C.G, α.s, α.o)
The definitions naturally extend to step sequences by induction. For a sequence ρ ∈ Σ∗S ∪Θ∗S ,
a step α ∈ ΣS ∪ΘS , and given configurations X,X ′ ∈MS ∪ CS we have:
X
ε7−→ X ′ def≡ X ′ = X
X
ρα7−→ X ′ def≡ ∃X ′′. X ρ7−→ X ′′ ∧X ′′ α7→ X ′
We call a pair (X,λ) ∈ (CS×Σ∗S)∪ (MS×Θ∗S) a Cosmos machine computation if the following
predicate holds:
comp(X,λ)
def≡ ∃X ′ ∈ CS ∪MS . X λ7−→ X ′
To convert a pair (θ, o) consisting of transition and ownership transfer sequences of the same
length |θ| = |o| into a Cosmos machine step sequence σ we use a construct 〈θ, o〉 = σ such that
σ.t = θ and σ.o = o.
2.1.4 Ownership Policy
To enforce the memory safety guaranteeing the absence of memory races for the concurrent
execution of any Cosmos machine, we use the memory access policy defined in [Bau14b] and
based on the ghost ownership state of the Cosmos machine configuration.
Definition 2.15 (Ownership Memory Access Policy). Given a flag io ∈ B, a reads-set R, a
writes-set W , a set O of addresses owned by a unit, sets S and R of shared and read-only
addresses respectively, and addresses O owned by other units, we enforce the following own-
ership memory access policy given by the following predicate
policyacc(io, R,W,O,S,R,O)
def≡
1. Local steps (i) read only owned or read-only addresses and (ii) write only owned unshared
addresses
¬io =⇒ (i) R ⊆ O ∪R
(ii) W ⊆ O \ S
2. IO-steps may (i) read owned, shared and read-only addresses while they (ii) may write
owned addresses and shared addresses which are not owned by other units.
io =⇒ (i) R ⊆ O ∪ S ∪R
(ii) W ⊆ O ∪ (S \ O)
Moreover, since we allow the ownership state of the Cosmos machine to be changed, we also
introduce a safety policy for such transitions.
Definition 2.16 (Ownership Transfer Policy). Given a bit ot ∈ B, a set of owned addresses O,
sets of shared S and owned by other units O addresses, as well as ownership transfer informa-
tion o ∈ ΩS , we restrict ownership transfer by the predicate
policy trans(ot,O,S,O, o)
def≡
19
2 Model for Concurrent Systems and Simulation
1. The ownership-state may not be changed by local steps or IO-steps at which we forbid
ownership transfer.
¬ot =⇒ o.Acq = ∅ ∧ o.Loc = ∅ ∧ o.Rel = ∅
2. For IO-steps suitable for ownership transfer, the ownership state is allowed to change as
long as the step (i) acquires only addresses which are shared unowned or already owned
by the executing unit and (ii) releases only owned addresses. Moreover (iii) the acquired
local addresses must be a subset of the acquired addresses and (iv) one may not acquire
and release the same address at the same time.
ot =⇒ (i) o.Acq ⊆ S \ O ∪ O
(ii) o.Rel ⊆ O
(iii) o.Loc ⊆ o.Acq
(iv) o.Acq ∩ o.Rel = ∅
Note, that in the original work by Christoph Baumann the ownership transfer policy was
considered for any IO-point that is in turn enough for the order reduction theorem. However,
simulation between two concurrent machines might require a stronger policy with the transfer
only at a certain IO-points. Therefore, in our work we are interested only in the safety proper-
ties that allow us not only to apply the order reduction but also to argue about the simulation.
In fact, since we mark only IO-points to be suitable for the operations on the ownership, one
can easily show that our restricted policy does not violate policy trans(io,O,S,O, o) from the
original theory.
Naturally, one has to restrict how the address space considered for the Cosmos model is split
into the subsets of addresses wrt. to the ownership state(e.g., different units cannot own the
same addresses, etc.). For this purpose we state an invariant that will be preserved if the Cosmos
machine execution follows the aforementioned policies.
Definition 2.17 (Ownership Invariant). We introduce an ownership invariant on an ownership
state G ∈ GS of a Cosmos machine which requires (i) the own-sets of different units to be mutu-
ally disjoint and (ii) that read-only addresses may not be owned or shared-writable. Moreover
(iii) the complete address space consists of all the owned, shared, and read-only addresses.
oinv(G) def≡ (i) ∀p, q. p 6= q =⇒ G.O(p) ∩ G.O(q) = ∅
(ii) ∀p. G.O(p) ∩R = ∅ ∧ G.S ∩R = ∅
(iii) A = R∪ G.S ∪
⋃
p∈Nnu
G.O(p)
Moreover we set the shorthand oinv(C) ≡ oinv(C.G) for all C ∈ CS .
Now we combine both policies for a step of the Cosmos machine and define it inductively for
step sequences.
Definition 2.18 (Ownership Safety of a Step). We consider a step of a Cosmos model S from
configuration C ∈ CS with step information α ∈ ΣS to be safe with respect to the ownership
model when for R = readsα.s(C,α.in), W = writesα.s(C,α.in), and O =
⋃
q 6=α.s C.Oq the fol-
lowing predicate is fulfilled
safestep(C,α)
def≡ policyacc(α.io,R,W,C.Oα.s, C.S,R,O) ∧
policy trans(α.ot, C.Oα.s, C.S,O, α.o)
20
2.2 Ownership Based Order Reduction
Definition 2.19 (Ownership Safety of a Step Sequence). For a configuration C of a Cosmos
model S and τ ∈ Σ∗S , α ∈ ΣS we define
safe(C, ε)
def≡ oinv(C)
safe(C, τα)
def≡ safe(C, τ) ∧ ∃C ′, C ′′. C τ7−→ C ′ α7→ C ′′ ∧ safestep(C ′, α)
As we have mentioned before, the ownership safe steps of a Cosmos machine from a given
configuration preserve the ownership invariant. The proof of this fact was shown in [Bau14b].
Lemma 2.1 (Ownership Safe Steps Preserve the Ownership Invariant). For Cosmos machine
configurations C,C ′ ∈ CS and a step sequence σ ∈ Σ∗S we have
C
σ7−→ C ′ ∧ safe(C, σ) =⇒ oinv(C ′)
2.2 Ownership Based Order Reduction
The introduced ownership and safety policy allow us to guarantee not only race-free concurrent
memory accesses by units present in the system, but also to reorder all unit steps after their ar-
bitrary interleaved execution. Baumann [Bau14b] showed in this case that arbitrary schedules
can be reduced to so called interleaving point schedules (or IP-schedules) preserving the result
of computations, memory safety and other properties. So, if we prove such properties for all
IP-schedules from a given configuration, they also hold for any arbitrarily interleaved compu-
tations starting in the same machine configuration.
First, we define the meaning of IP-schedules.
Definition 2.20 (Interleaving Point Schedule). For a step sequence ρ ∈ Σ∗S ∪Θ∗S we define the
predicate
IPsched(ρ) def≡ ( ρ = ρ′αβ =⇒ IPsched(ρ′α) ∧ ((α.s = β.s) ∨ β.ip) )
that expresses whether the sequence is an interleaving point schedule (IP-schedule).
In other words, the IP-schedule consists of interleaved blocks of units’ consecutive steps
starting in the interleaving points except the first one.
The idea behind the order reduction is based on the preservation of units’ local steps as well
as the order of accesses to shared resources. For this purpose we provide a special notation
from [Bau14b] extended also for the OT predicate introduced in this work.
Definition 2.21 (Step Subsequence Notation). For any step sequence ρ ∈ Σ∗S ∪ Θ∗S and a unit
index p we define the subsequence of steps of unit p as ρ|p, the IO-step subsequence of ρ as ρ|io,
and the OT -step subsequence of ρ as ρ|ot:
ρ|io def≡

ατ |io : ρ = ατ ∧ α.io
τ |io : ρ = ατ ∧ ¬α.io
ε : otherwise
ρ|ot def≡

ατ |ot : ρ = ατ ∧ α.ot
τ |ot : ρ = ατ ∧ ¬α.ot
ε : otherwise
ρ|p def≡

ατ |p : ρ = ατ ∧ α.s = p
τ |p : ρ = ατ ∧ α.s 6= p
ε : otherwise
21
2 Model for Concurrent Systems and Simulation
Now, we can define that two step sequences are equivalently reordered, meaning that in both
sequences the order of steps mentioned above is preserved. This definition is crucial for arguing
about existence of an equivalent interleaving point schedule for a given step sequence.
Definition 2.22 (Equivalent Reordering Relation). Given two step sequences ρ, ρ′ ∈ Σ∗S ∪Θ∗S ,
we consider them equivalently reordered when the IO-step andOT -step subsequences as well
as the step subsequences of all units are the same:
ρ =ˆ ρ′
def≡ ρ|io = ρ′|io ∧ ρ|ot = ρ′|ot ∧ ∀p ∈ Nnu . ρ|p = ρ′|p
We also say that ρ′ is an equivalent reordering of ρ and, for any starting configuration C, that
(C, ρ′) is an equivalently reordered computation of (C, ρ). Note that =ˆ is an equivalence relation,
i.e., it is reflexive, symmetric, and transitive.
As we have seen so far one can instantiate a Cosmos model with a definition of interleaving
points independent from steps performing IO-operations. However, to make the reordering of
steps feasible we have to require that between two IO-points a unit passes through at least one
IP-point and all units of the system start their computations in interleaving points.
Definition 2.23 (IOIP Condition). For any sequence ρ ∈ Σ∗S ∪ Θ∗S , the predicate IOIP(ρ)
denotes that every unit p starts in an interleaving point and there is least one interleaving point
between any two IO-points of p.
IOIP(ρ) def≡ ∀pi, p. pi = ρ|p 6= ε =⇒
pi1.ip ∧ (∀τ, α, ϕ, β, ω. pi = ταϕβω ∧ α.io ∧ β.io =⇒ ∃i. ϕi.ip)
According to [Bau14b] such an equivalent reordering does not break the IOIP condition.
Lemma 2.2 (Equivalent Reordering of IOIP Sequences). The IOIP condition is preserved by
equivalent reordering. For sequences ρ, ρ′ ∈ Σ∗S ∪Θ∗S of Cosmos machine S, we have
ρ =ˆ ρ′ ∧ IOIP(ρ) =⇒ IOIP(ρ′)
Now, using this fact and argumentation about sequences with the IOIP condition one can
argue about the existence of IP-schedules.
Lemma 2.3 (Interleaving Point Schedule Existence). For every step sequence θ that fulfills the
IOIP condition, we can find an interleaving point schedule θ′ which is an equivalent reordering of θ:
IOIP(θ) =⇒ ∃θ′. θ′ =ˆ θ ∧ IPsched(θ′)
So, every transition sequence can be reordered into an IP-schedule preserving the IOIP
condition. However, one has to show that equivalent reordering also preserves computations
for some starting configuration. This fact was proven in [Bau14b] under requirement that com-
putations of Cosmos model obey the ownership policy.
22
2.2 Ownership Based Order Reduction
Lemma 2.4 (Safety of Reordered Computations). For Cosmos machine configurationsC,C ′ ∈ CS
and step sequences σ, σ′ ∈ Σ∗S we have
C
σ7−→ C ′ ∧ safe(C, σ) ∧ σ =ˆ σ′ =⇒ safe(C, σ′) ∧ C σ
′
7−→ C ′
Summing up Lemmas 2.2– 2.4 we finally conclude that a given execution of a Cosmos machine
with arbitrarily interleaved unit steps can be simplified to an execution of the machine with an
equivalent IP-schedule.
Corollary 2.1. Any safe Cosmos machine computation (C, σ) resulting in a configuration C ′ and
fulfilling the IOIP condition can be reordered into an equivalent IP-schedule preserving the result of
the machine transition, the ownership safety and the IOIP condition
C
σ7−→ C ′ ∧ safe(C, σ) ∧ IOIP(σ) =⇒
∃σ′. σ′ =ˆ σ ∧ IPsched(σ′) ∧ IOIP(σ′) ∧ safe(C, σ′) ∧ C σ
′
7−→ C ′
For the proof of Lemma 2.4 Baumann introduced additional notation and properties for Cos-
mos machine configurations. Here we present only a few of them which we also explicitly rely
on in the present work.
Definition 2.24 (Cosmos Model Relations). We define the following equivalence relations on
Cosmos machine configurations C,C ′ ∈ CS and a unit p ∈ Nnu to denote (i) the equality of p’s
unit state, the read-only and owned memory contents, (ii) the equivalence of the ownership
configurations for p, (iii) the combination of both that we call a p-equivalence, (iv) the complete
equality of the ownership states, and (v) the equality of the shared addresses and contents of
the read-only and shared memory regions in the system.
(i) C l∼p C ′ def≡ C.up = C ′.up ∧ C.m|C.Op∪R = C ′.m|C.Op∪R
(ii) C o∼p C ′ def≡ C.Op = C ′.Op ∧ C.Op ∩ C.S = C ′.Op ∩ C ′.S
(iii) C ≈p C ′ def≡ C l∼p C ′ ∧ C o∼p C ′
(iv) C o∼ C ′ def≡ ∀p ∈ Nnu . C.Op = C ′.Op ∧ C.S = C ′.S
(v) C s∼ C ′ def≡ C.S = C ′.S ∧ C.m|C.S∪R = C ′.m|C.S∪R
As an important property concerning the third relation one can claim that an ownership safe
step of one unit does not change the configuration of any other unit, its owned and read-only
memories as well as its set of owned addresses.
Lemma 2.5 (Environment Steps). Given a step α ∈ ΣS and Cosmos machine configurationsC,C ′ ∈
CS such that C
α7→ C ′ holds, we have
∀p 6= α.s. safe(C,α) =⇒ C ≈p C ′
Along with the ownership safety it is usually desirable to verify also other properties of the
machine execution, e.g. software conditions, well-formedness invariants, etc. Therefore, to
denote such a general property that should hold in any traversed stated of a Cosmos machine
23
2 Model for Concurrent Systems and Simulation
computation, we introduce a predicate P : CS → B and extend the ownership safety of a step
sequence accordingly:
safeP (C, σ)
def≡ safe(C, σ) ∧ ∀C ′ ∈ CS . C σ7−→ C ′ =⇒ P (C ′)
Now, as a verification engineer dealing with an instantiated Cosmos machine, one would like
to know whether the verification of safety properties for all arbitrarily interleaved traces origi-
nating from a given starting configuration C ∈ CS is covered by verification of these properties
for all traces with IP-schedules starting in the same configuration. The order reduction theo-
rem proven in [Bau14b] easily answers this main question.
Definition 2.25 (Verified Cosmos Machine). We define a predicate safety(C,P ) which states
that for all Cosmos machine computations starting in C we can find an ownership annotation
such that the computation is safe and preserves the given property P .
safety(C,P ) ≡ ∀θ. comp(C.M, θ) =⇒ ∃o ∈ Ω∗S . safeP (C, 〈θ, o〉)
We also define a predicate safetyIP(C,P ) which expresses the same notion of verification for
all IP-schedule computations:
safetyIP(C,P ) ≡ ∀θ. IPsched(θ) ∧ comp(C.M, θ) =⇒ ∃o ∈ Ω∗S . safeP (C, 〈θ, o〉)
Additionally, all IP-schedules starting in C need to fulfill the IOIP condition.
IOIPIP(C) ≡ ∀θ. IPsched(θ) ∧ comp(C.M, θ) =⇒ IOIP(θ)
Thus, we consider a Cosmos machine ownership safe if any step sequence can be annotated
with an ownership transfer sequence such that the ownership policy is obeyed. Using the defi-
nitions above we finally state the aforementioned great theorem.
Theorem 2.1 (IP-Schedule Order Reduction). For a machine configuration C of a Cosmos model
S where all IP-schedule computations originating inC fulfill the IOIP condition, we can deduce safety
property P and ownership safety on all possible computations from the verification of these properties on
all IP-schedules.
safetyIP(C,P ) ∧ IOIPIP(C) =⇒ safety(C,P )
To understand why this theorem actually works we need also to show that every Cosmos ma-
chine computation can be represented by an equivalently reordered IP-schedule computation.
This fact is proven in the following lemma:
Lemma 2.6 (Coverage). From safetyIP(C,P ) and IOIPIP(C) it follows that for any Cosmos
machine computations (C.M, θ) the IOIP condition is fulfilled and any equivalently reordered IP-
schedule can be executed from C.
safetyIP(C,P ) ∧ IOIPIP(C) ∧ comp(C.M, θ) =⇒
IOIP(θ) ∧ (∀θ′. θ =ˆ θ′ ∧ IPsched(θ′) =⇒ comp(C.M, θ′))
From now, given the order reduction theorem we can treat any arbitrary Cosmos machine
computation at the granularity of interleaving point schedules if the verification conditions are
satisfied.
24
2.3 System Simulation in Concurrency
2.3 System Simulation in Concurrency
As we already know, one usually guarantees a simulation for a sequential execution of a unit
for which the underlying semantics without environment steps and the ownership is defined.
In the context of a concurrent execution of such units, however, only the engineer is responsible
to program the machine in a way that the sequential simulation for each unit is preserved and
the code is executed as expected. The ownership safety policy considered in this work models
exactly such a strategy chosen by the programmer.
The goal of the rest of this chapter is to formulate and prove a general simulation theorem
between two Cosmos machines that shows how we treat the sequential simulation for each unit
in the concurrent context and allows us to transfer the ownership safety and other verified
properties to the implementation layer so that we can talk about them only for IP-schedules
where every interleaving point is a consistency point.
For this purpose we first introduce a simple notation that enables switching from the IP-
schedules to so called IP-blocks. Then, we state a general sequential simulation theorem which
covers the correctness of a sequential machine execution. As a last step, we provide the general
Cosmos model simulation theorem and its proof.
All these parts are described in detail in [Bau14b]. However, our attempts to apply the orig-
inal theory for different layers of the verification stack considered in our work showed its re-
strictions and that the theory does not guarantee some desired properties, what, in turn, makes
its application here infeasible. One of the main issues not taken into account in [Bau14b] is
that the local unit simulation cannot cover the whole system configuration including the local
memories of other units because some of them might have not reached their consistency points.
Therefore, the unit simulation relation in the concurrent context has to deal with a dynamically
changed subset of memory addresses. To make the theory more intuitive and to disambiguate
some of ideas behind it, in our version we also adapt many of definitions coming from the orig-
inal work. This all required to extend the sequential simulation theorem and to provide the
reformulated Cosmos model simulation theorem as well as its modified proof.
2.3.1 Block Machine Semantics and Reduction to Block Computations
Since we may assume IP-schedules in our theory, we can simplify the concurrent machine
semantics to the subsequent atomic execution of blocks starting in interleaving points. We call
the machine implementing such semantics an IP-block machine or simply a block machine.
Definition 2.26 (IP-Block). A transition sequence λ ∈ Θ∗S is called an IP-block of a unit p ∈ Nnu
if it (i) contains only steps by the given unit, (ii) is empty or starts in an interleaving point, and
(iii) does not contain any further interleaving points.
blk(λ, p)
def≡ (i) ∀α ∈ λ. α.s = p
(ii) λ 6= ε =⇒ λ1.ip
(iii) λ 6= ε ∧ λ = λ1 ◦ λ′ =⇒ ∀α ∈ λ′. ¬α.ip
Definition 2.27 (Block Machine Schedule). We define the predicateBsched which denotes that
a given block sequence κ ∈ (Θ∗S)∗ is a block machine schedule:
Bsched(κ)
def≡ ∀λ ∈ κ. ∃p ∈ Nnu . blk(λ, p)
Obviously, the flattening concatenation bκc def≡ κ1 · · ·κ|κ| ∈ Θ∗S of all blocks of κ is an IP-
schedule.
25
2 Model for Concurrent Systems and Simulation
Instead of defining an additional transition function for the block machine we easily extend
our step sequence notation to block sequences.
Definition 2.28 (Step Notation for Block Sequences). Given two machine states M,M ′ ∈ MS
and a block machine schedule κ ∈ (Θ∗S)∗, we denote that M ′ is reached by executing the block
machine from state M wrt. schedule κ as follows
M
κ7−→M ′ def≡ M bκc7−→M ′
Obviously, one can argues that any IP-schedule can be transformed into a block schedule.
Lemma 2.7 (Block Machine Schedule Existence). For any IP-schedule θ we can find a correspond-
ing block schedule κ such that the flattening concatenation of κ’s blocks equals θ.
∀θ ∈ Θ∗S . IPsched(θ) =⇒ ∃κ ∈ (Θ∗)∗. Bsched(κ) ∧ bκc = θ
Baumann showed that we can reduce the verification of IP-schedules to the verification of
block machine computations.
Definition 2.29 (Block Machine Safety). We call a block machine with an initial Cosmos ma-
chine configuration C ∈ CS safe wrt. the ownership policy and some property P if the follow-
ing predicate holds
safetyB(C,P )
def≡
∀κ ∈ (Θ∗S)∗. Bsched(κ) ∧ comp(C.M, bκc) =⇒ ∃o ∈ Ω∗S . safeP (C, 〈bκc, o〉)
Then, the following theorem proven in [Bau14b] justifies the desired reduction.
Theorem 2.2 (Block Machine Reduction). Let C be a configuration of Cosmos model S and P be a
Cosmos model safety property. Then if all block machine computations running out of C are ownership
safe and preserve P , the same holds for all IP schedules starting in C.
safetyB(C,P ) =⇒ safetyIP(C,P )
2.3.2 Generalized Sequential Simulation Theorem
Consider two Cosmos model instantiations Sc, Sa ∈ S and machine computations (d, σ) ∈MSc×
Θ∗Sc and (e, τ) ∈ MSa × Θ∗Sa such that the steps of the abstract layer Sa are simulated by the
concrete Sc. The interleaving points of both machines are instantiated as consistency points and
unit transitions are performed now on so called consistency blocks substituting the term IP-block
in the simulation context.
Thanks to the order reduction theorem, it is clear that the simulation theorem proven for the
sequential execution without environment steps can now be applied for any consistency block.
In out theory of Cosmos models we provide its generalized version that need to be instantiated
with respect to the sequential simulation between the given concrete and abstract machines. As
we mentioned before, in contrast to [Bau14b], the sequential simulation has to take into account
the dynamically changed subsets of addresses at which the memory consistency might not hold
(inconsistent memory). Moreover, we only demand the same number of computation units in
both machines Sa.nu = Sc.nu = nu and do not require that the memories of both machines
26
2.3 System Simulation in Concurrency
have compatible types (i.e., Sc.A ⊇ Sa.A and Sc.V = Sa.V) because the model Sa might contain
an abstraction of the implementation memory.
For this purpose we define a so called sequential simulation framework with components usu-
ally having counterparts in the sequential simulation between any two machines. Below, for
shortness we denote all components Sa.X and Sc.X of the corresponding machines as Xa and
Xc respectively.
Definition 2.30 (Sequential Simulation Framework). We introduce a type R for simulation
frameworks RSaSc ∈ R which contains all the information needed to state the generalized simu-
lation theorem relating sequential computations of units of Cosmos models Sc and Sa.
R
def≡ (P, sim, CPa, CPc,wfa,wfc, suit , sc,wb)
• P – a set of static simulation parameters ({⊥} if there are none) common for all units.
• sim : Lc × La × P × 2Aa → B – a simulation relation between given units of Sc and Sa,
where Lx ≡ (Ux × (Ax → Vx)) with x ∈ {c, a} is a shorthand for a sequential machine con-
figuration containing a state of the unit of the concurrent machine and the whole memory.
The parameter of the type 2Aa represents a set of addresses for the possible inconsistent
memory.
• CPa : Ua × P → B – a predicate identifying consistency points of the abstract Sa.
• CPc : Uc × P → B – a predicate identifying consistency points of the concrete Sc.
• wfa : La → B – a well-formedness condition for a sequential machine configuration for a
given unit of the abstract Sa,
• wfc : Lc → B – well-formedness condition for a sequential machine configuration of the
concrete Sc, required for the simulation of sequential computations of Sa.
• suit : Ec → B – a predicate determining whether a given input of the concrete machine is
suitable for the simulation.
• sc : La × Ea × P → B – software conditions that enable a simulation of sequential com-
putations of Sa. It is defined for a sequential machine configuration of a given unit and a
corresponding input for a unit’s step.
• wb : Lc×Ec×P → B – a predicate that restricts the simulating computations of Sc. We say
that a simulating step in a computation of Sc is well-behaved iff it fulfills this restriction.
The property becomes especially important when we consider a verification stack and
need to transfer a part of software conditions to the implementation layer.
To apply the sequential simulation framework in the context of machine step sequences we
introduce additional shorthands for d ∈ MSc , e ∈ MSa , p ∈ Nnu, par ∈ P (where P ≡ RSaSc .P),
27
2 Model for Concurrent Systems and Simulation
icm ∈ 2Aa , steps α ∈ ΘSa , β ∈ ΘSc , and step sequences ω ∈ Θ∗Sc , τ ∈ Θ∗Sa :
simp(d, e, par , icm) ≡ RSaSc .sim((d.u(p), d.m), (e.u(p), e.m), par , icm)
CPp(e, par) ≡ RSaSc .CPa(e.u(p), par)
CPp(d, par) ≡ RSaSc .CPc(d.u(p), par)
wf p(e) ≡ RSaSc .wfa(e.u(p), e.m)
wf p(d) ≡ RSaSc .wfc(d.u(p), d.m)
suit(β) ≡ RSaSc .suit(β.in)
suit(ω) ≡ ∀β ∈ ω. suit(β)
wb(d, β, par) ≡ RSaSc .wb((d.u(β.s), d.m), β.in, par)
wb(d, ω, par) ≡ ∀θ, α, θ′, d′. ω = θαθ′ ∧ d θ7−→ d′ =⇒ wb(d′, α, par)
sc(e, α, par) ≡ RSaSc .sc((e.u(α.s), e.m), α.in, par)
sc(e, τ, par) ≡ ∀θ, α, θ′, e′. τ = θαθ′ ∧ e θ7−→ e′ =⇒ sc(e′, α, par)
If the given sequence does not match the computation, namely io, ot or/and ip do not corre-
spond to configurations in the computation, or the transition is undefined, the predicates wb
and sc can still hold. This issue is resolved when we use the predicates under the requirement
of computation existence.
Along with the software conditions, we introduce also a predicate requiring that the incon-
sistent region of memory is not accessed by a unit during its step.
Definition 2.31 (No Access to Inconsistent Memory). For abstract machine configurations
e, e′ ∈MSa , icm ∈ 2Aa and an existing step e α7→ e′ with α ∈ ΘSa we define
noacc(e, α, icm)
def≡ (readsα.s(e, α.in) ∪ writesα.s(e, α.in)) ∩ icm = ∅
The definition easily extends to a sequence τ ∈ Θ∗Sa :
noacc(e, τ, icm) ≡ ∀θ, α, θ′, e′. τ = θαθ′ ∧ e θ7−→ e′ =⇒ noacc(e′, α, icm)
Since our order reduction theory assumes the IOIP condition, we have to guarantee at least
one consistency point between two IO-points. For example, in case of compiler correctness,
the compiler is supposed to insert the consistency points into the compiled code in the proper
way. Moreover, there must be one-to-one mapping of the IO-points suitable for the ownership
transfer and any IO-step performed on the abstract level has to be also implemented by the
concrete machines.
In contrast to [Bau14b], where the consistency blocks of both machines must contain the same
number of IO-operations, our requirement is weaker and allows to argue about IO-steps of
the concrete machine that has no effect on the abstract level. As an example, one could consider
an unsuccessful attempt to acquire a lock which implemented as a read of a shared resource
without an ownership transfer. In this case, one could reduce the IO-operation to an empty
step of the abstract machine.
Definition 2.32 (Requirement on IO-points in Consistency Blocks). We denote the require-
ments on IO-points in consistency blocks by an overloaded predicate oneIO. For a transition
sequence ω ∈ Θ∗Sa ∪ Θ∗Sc it demands that the sequence contains at most one IO-step. For two
28
2.3 System Simulation in Concurrency
such sequences σ ∈ Θ∗Sc , σ 6= ε, and τ ∈ Θ∗Sa , however, we require the same number of OT -
steps, and if τ contains an IO-operation, σ does it also.
oneIO(ω) def≡ ∀i, j ∈ N|ω|. ωi.io ∧ ωj .io =⇒ i = j
oneIO(σ, τ) def≡ (i) oneIO(σ) ∧ oneIO(τ)
(ii) σ|ot 6= ε⇐⇒ τ |ot 6= ε
(iii) τ |io 6= ε =⇒ σ|io 6= ε
Note, that this requirement is still too strong for the simulation where the concrete machine
accesses some shared resources invisible on the abstract level and supports the ownership trans-
fer for them. Since we do not deal with such models in the scope of the thesis, we leave this
issue for the future work.
The sequential simulation theorem can be applied on any consistency block on the implemen-
tation level in order to obtain an abstract consistency block executed by the same unit. However,
a given consistency block for a concrete model might be incomplete, meaning that the computa-
tion unit has not reached the next consistency point yet. Therefore, one has to find an extension
of that incomplete block, so that the resulting complete concrete block implements an abstract
one. Via this extension we can show exactly the existence of the next consistency point, what
must be claimed in the generalized sequential simulation theorem.
Definition 2.33 (Extension of Consistency Block). The extension of a given consistency block
ω ∈ Θ∗Sc is denoted by ω Bblkp σ and says that σ ∈ Θ∗Sc extends ω without adding consistency
points to the block. The given ω can be considered as a prefix of the non-empty σ.
ω Bblkp σ
def≡ ∃τ. σ = ωτ 6= ε ∧ blk(σ, p) ∧ blk(ω, p)
Using the definitions introduced above we can finally state the generalized sequential simu-
lation theorem for two Cosmos models Sa, Sc ∈ S and a corresponding simulation framework
RSaSc ∈ R.
Theorem 2.3 (Generalized Sequential Simulation Theorem). For any couple of abstract and con-
crete machine configurations e ∈ MSa , d ∈ MSc , any unit p ∈ Nnu, and any possible inconsistent
portion of memory icm ∈ 2Aa ,
• if the unit p in both machines is in a consistency point, well-formed and coupled via the simulation
relation wrt. icm and a simulation parameter par ∈ P , and
• if all complete consistency block computations of the unit p in the abstract machine starting from e
obey the software conditions, do not access icm and lead to well-formed configurations,
then for any existing suitable non-empty computations ω ∈ Θ∗Sc from d of the unit p we guarantee that
there exist further suitable steps of the same unit leading to the next consistency point such that
• the steps of the resulting complete block are suitable and well-behaved,
• implement an existing complete consistency block of the abstract machine,
• and the simulation relation excluding the same icm holds after the steps of both machines in their
well-formed configurations for the unit p.
29
2 Model for Concurrent Systems and Simulation
• Moreover, the abstract and concrete complete consistency blocks contain at most one IO-step, an
IO-step of the abstract machine is always implemented by the concrete machine, and the ownership
transfer may be performed in the computations of only both machines.
∀d ∈MSc , e ∈MSa , par ∈ P, p ∈ Nnu, icm ∈ 2Aa , ω ∈ Θ∗Sc .
(i) wf p(d) ∧ wf p(e)
(ii) simp(d, e, par , icm) ∧ CPp(d, par) ∧ CPp(e, par)
(iii) ∀λ ∈ Θ∗Sa , e′ ∈MSa . e
λ7−→ e′ ∧ blk(λ, p) ∧ CPp(e′, par) =⇒
noacc(e, λ, icm) ∧ sc(e, λ, par) ∧ wf p(e′)
(iv) ω 6= ε ∧ blk(ω, p) ∧ suit(ω) ∧ comp(d, ω)
=⇒
∃σ ∈ Θ∗Sc , d′ ∈MSc , τ ∈ Θ∗Sa , e′ ∈MSa .
(i) ω Bblkp σ ∧ suit(σ)
(ii) blk(τ, p) ∧ oneIO(σ, τ)
(iii) d σ7−→ d′ ∧ wb(d, σ, par) ∧ wf p(d′)
(iv) e τ7−→ e′ ∧ wf p(e′)
(v) simp(d′, e′, par , icm) ∧ CPp(d′, par) ∧ CPp(e′, par)
Note that for the simulated computation (e, τ) we only demand progress (i.e., τ 6= ε) in case
σ contains IO-steps suitable for the ownership transfer. This easily follows from the definition
of oneIO(σ, τ).
Moreover, the claim for all icm allows us to apply the theorem for concurrent systems where
environment steps are allowed. As we know already from the order reduction, the interleaving
of blocks can appear only in consistency points. Therefore, before the execution of a unit p there
might be an incomplete block of another unit q 6= p that destroyed the consistency of some
region of memory. So, in contrast to the classical inductive proofs for systems with a single
unit done for the static icm = ∅, our formulation covers all possible cases of environment steps.
In the presence of the ownership state and safety policy, we will be able to see which memory
portions exactly could be inconsistent.
Now, since the sequential simulation theorem guarantees existence of consistency points, we
can consider sequences of consistency block for the abstract and implementation layers and
formulate our final correctness statement for concurrent systems.
2.3.3 Cosmos Model Simulation
We have already mentioned the complete and incomplete consistency blocks. First, we need to
provide their formal definitions.
Consistency Block Machines
Definition 2.34 (Interleaving Points are Consistency Points). Given a sequential simulation
frameworkRSaSc which relates two Cosmos models Sc and Sa and a simulation parameter par ∈ P
we define a predicate denoting that in Sc and Sa the interleaving points are set up to be exactly
30
2.3 System Simulation in Concurrency
the consistency points.
IPCP(RSaSc , par)
def≡ (i) ∀d ∈MSc , α ∈ ΘSc . IPα.s(d, α.in)⇐⇒ CPα.s(d, par)
(ii) ∀e ∈MSa , β ∈ ΘSa . IPβ.s(e, β.in)⇐⇒ CPβ.s(e, par)
Definition 2.35 (Consistency Block Machine Schedule). A block machine schedule κ ∈ (Θ∗Sc)∗∪
(Θ∗Sa)
∗ is a consistency block machine schedule iff the following property holds:
CPsched(κ, par) def≡ Bsched(κ) ∧ IPCP(RSaSc , par)
In the concurrent model simulation theorem between the concrete and abstract machines
we will aim at the verification of safety properties and software conditions for all consistency
block machine computations of the abstract machine with all units being in their consistency
points after each block execution. Therefore for such a consistency block schedule we add a
corresponding requirement.
Definition 2.36 (Complete Consistency Block Machine Computation). A complete consistency
block machine computation of a block schedule ν ∈ (Θ∗Sa)∗ starting from a configuration e ∈ MSa
is a block machine computation where all computation units are in consistency points in every
configuration
CPschedc(e, ν, par) def≡ CPsched(ν, par) ∧ ∀ν′, ν′′ ∈ (Θ∗Sa)∗, e′ ∈MSa .
ν = ν′ν′′ ∧ e ν
′
7−→ e′ =⇒ ∀p ∈ Nnu . CPp(e′, par)
The same definition can be reloaded for the concrete machine with d ∈MSc and κ ∈ (Θ∗Sc)∗:
CPschedc(d, κ, par) def≡ CPsched(κ, par) ∧ ∀κ′, κ′′ ∈ (Θ∗Sc)∗, d′ ∈MSc .
κ = κ′κ′′ ∧ d κ
′
7−→ d′ =⇒ ∀p ∈ Nnu . CPp(d′, par)
For block schedules of concrete and abstract machines we introduce a definition of the sched-
ule equivalence that will be later used in the simulation between both machines.
Definition 2.37 (Block Schedules Equivalence). For a block schedule κ ∈ (Θ∗Sc)∗ of the concrete
Cosmos model with non-empty blocks and a block schedule ν ∈ (Θ∗Sa)∗ of the abstract Cosmos
model with possibly empty blocks or both empty sequences we introduce the equivalence re-
lation denoting that both schedules are of the same length and corresponding blocks in these
schedules belong to the same unit. Moreover, each block of both schedules obey the condition
oneIO.
ν
sch∼ κ def≡ (i) |ν| = |κ|
(ii) ∀j ∈ N. j ≤ |κ| =⇒ (a) oneIO(κj , νj)
(b) ∀p, q ∈ Nnu. blk(νj , q) ∧ νj 6= ε∧
blk(κj , p) =⇒ p = q
Verification of the Abstract Machine
Analogously to the block machine safety we define the notion of the safety for our complete
consistency block machine computations.
31
2 Model for Concurrent Systems and Simulation
Definition 2.38 (Complete Consistency Block Machine Safety). The verification of the owner-
ship safety and some property P for all complete block machine computations running out of a
configuration C ∈ CSa ∪CSc is denoted by a predicate safetycB (C,P, par) with par ∈ P , where,
depending on the machine in question, Ω is either ΩSa or ΩSc , and Θ is either ΘSa or ΘSc :
safetycB (C,P, par)
def≡ ∀κ ∈ (Θ∗)∗.
(i) CPschedc(C.M, κ, par)
(ii) comp(C.M, bκc)
=⇒ ∃o ∈ Ω∗. safeP (C, 〈bκc, o〉)
As we can see this definition of safety taken from [Bau14b] does not directly correspond to
the previously considered statement of the block machine safety but has a similar form. In fact,
given that it is always possible to reach a consistency point we could prove the reduction of
arbitrary consistency block machine schedules to the complete ones. However, the existence
of consistency points and aforementioned reachability can only be justified via the simulation
between two machines. Moreover, we do not use the concurrent simulation theorem to show
that the abstract machine properties verified for complete consistency block machine computa-
tions can be transferred to arbitrary interleaving schedules on the same abstract level. Thus it is
useless to treat the reduction of incomplete blocks on a single layer of abstraction.
At it follows from the predicate safetycB (C,P, par), we are interested in the concurrent sim-
ulation for complete block abstract machine computations for which a safe ownership transfer
annotation exists and the property P holds in the configuration C and after each block execu-
tion. To denote this for a block machine schedule κ ∈ (Θ∗)∗, an ownership transfer annotation
o ∈ Ω∗, s.t. |o| = |bκc| we define the following predicate which we will use in the simulation
theorem rather as a shorthand:
safecB (C,P, κ, o)
def≡ ∀κ′, κ′′ ∈ (Θ∗)∗, o′, o′′ ∈ Ω∗.
κ = κ′κ′′ ∧ o = o′o′′ ∧ |o′| = |bκ′c| =⇒
safeP (C, 〈bκ′c, o′〉)
Obviously, by induction on the length of the block schedule sequence, one can easily show the
correspondence between these definitions:
safetycB (C,P, par) =⇒ ∀κ ∈ (Θ∗)∗.
(i) CPschedc(C.M, κ, par)
(ii) comp(C.M, bκc)
=⇒ ∃o ∈ Ω∗. safecB (C,P, κ, o)
Analogously to the safety properties, it is enough to prove the software conditions (SC) only
for complete consistency blocks on the abstract level.
Definition 2.39 (Verification of SC and Well-Formedness for Complete Block Machines). For
all complete block machine schedules running out of a configuration e ∈ MSa we define a
predicate denoting the verification of software conditions and the well-formedness invariant
32
2.3 System Simulation in Concurrency
for all abstract complete block machine computations:
scwf cB (e, par)
def≡ ∀η ∈ (Θ∗Sa)∗ .
(i) CPschedc(e, η, par)
(ii) comp(e, bηc)
=⇒
(i) sc (e, bηc, par)
(ii) ∀e′ ∈MSa . e η7−→ e′ =⇒ ∀p ∈ Nnu. wf p(e′)
Definition 2.40 (Verified Abstract Cosmos Machine). We call the abstract Cosmos machine
Sa starting its computations in a configuration E ∈ MSa verified wrt. the simulation param-
eter par ∈ P and the property PSa if and only if the predicates safetycB (E,PSa , par) and
scwf cB (E.M, par) hold.
Invariants and Simulation Relation Between Cosmos Machines
First, we introduce a few auxiliary definitions which will be used in the simulation relation
between two machines. We already know that in the concrete machine computations not all
units might reach the consistency points.
Definition 2.41 (Units in Consistency Points). Given a machine configuration d ∈ MSc and a
simulation parameter par as above we can define the set Ucp of computation units of d that are
in consistency points wrt. the simulation parameter par .
Ucp(d, par)
def≡ {p ∈ Nnu | CPp(d, par)}
As mentioned before in the concurrent execution of the consistency blocks the simulation
relation of a unit should exclude memory addresses for which the memory might be possibly
inconsistent because of units not reached their consistency points. Obviously, as such a memory
region for a given unit we can choose other units’ local addresses because the unit does not
access them anyway.
Definition 2.42 (Other Units’ Local Addresses). For a unit p ∈ Nnu of a model S ∈ S we
compute a set of local addresses of all other units from the ownership state G ∈ GS as follows:
SO(G, p) def≡
⋃
q 6=p
G.O(q) \ G.S
Then, for a given unit one can prove an easy lemma about the whole memory space excluding
other units’ local addresses.
Lemma 2.8 (Memory Addresses without other Units’ Local Addresses). For any Cosmos ma-
chine S ∈ S, its configuration E ∈ CS and a unit p ∈ Nnu, if the ownership invariant oinv(E) holds,
then the set all memory addresses without the owned non-shared addresses of all units except p is equal
to the union of the read-only, shared and owned by the unit p addresses.
oinv(E) =⇒ S.A \ SO(E.G, p) = S.R∪ E.S ∪ (E.Op \ E.S)
33
2 Model for Concurrent Systems and Simulation
Proof: To prove the claim we open the definition of SO(E.G, p) and use parts (i)-(iii) of the
invariant oinv(E) to simplify the computations on the given sets of addresses.
S.A \ SO(E.G, p) = S.A \
⋃
q 6=p
E.Oq \ E.S

=
(
E.R∪ E.S ∪
⋃
q
E.Oq
)
\
⋃
q 6=p
E.Oq \ E.S
 by oinv(E).(iii)
= E.R∪ E.S ∪
(⋃
q
E.Oq
)
\
⋃
q 6=p
E.Oq \ E.S
 by oinv(E).(ii)
= E.R∪ E.S ∪
(⋃
q
E.Oq \ E.S
)
\
⋃
q 6=p
E.Oq \ E.S

= S.R∪ E.S ∪ (E.Op \ E.S) by oinv(E).(i)
Now, using the sequential simulation relation from the framework RSaSc , we define the overall
concurrent simulation relation for all units in consistency points.
Definition 2.43 (Concurrent Simulation Relation). For a Cosmos machine configuration E ∈
CSa , a concrete concurrent machine configuration d ∈ MSc , the simulation parameter par ∈ P ,
and a unit p ∈ Nnu we define its sequential simulation relation in the context of concurrent
execution:
csimp(d,E, par)
def≡ simp
(
d,E.M, par,SO(E.G, p))
Then, for all units in consistency points we demand
csim(d,E, par)
def≡ ∀p ∈ Ucp(d, par). csimp(d,E, par)
Since the concrete and abstract Cosmos machines contain the ownership states, we are also
supposed to couple them in our theorem. Additionally, we are interested in the separate simu-
lation relation between shared and read-only memories. We will claim such coupling not only
for the units in consistency points, but also for those that have not reached them. For this pur-
pose we provide a so called shared invariant introduced in [Bau14b].
Definition 2.44 (Shared Invariant). For concrete and abstract Cosmos machines Sc, Sa, and
the sequential simulation framework RSaSc the shared invariant sinv
Sa
Sc
couples and constrains
the ownership states as well as the shared and read-only memories of both machines. For
Mx = Ax ⇀ Vx and Ox = Nnu → 2Ax with x ∈ {c, a}we have
sinvSaSc : (Mc × 2Ac × 2Ac ×Oc)× (Ma × 2Aa × 2Aa ×Oa)× P → B
Moreover, we introduce a shorthand asserting the shared invariant on two Cosmos machine
configurations D and E. Let G(C) ≡ (C.m|C.S∪S.R, C.S, S.R, C.O), then we define
sinv(D,E, par) ≡ sinvSaSc (G(D), G(E), par)
One can easily prove an important property about the shared invariant preservation in case
of local steps by the concrete machine [Bau14b].
34
2.3 System Simulation in Concurrency
Lemma 2.9 (Safe Local Steps Preserve Shared Invariant). Consider concrete Cosmos machine
configurations D,D′ ∈ CSc such that D′ is reached from D by a step sequence ω containing no IO-
steps. Moreover, Eˆ ∈ CSa is an abstract machine configuration corresponding to the beginning or the
end of the computation (D,ω). If the given concrete computation is ownership safe, then D and Eˆ are
coupled by the shared invariant if and only if D′ and Eˆ are.
D
ω7−→ D′ ∧ ω|io = ε ∧ safe(D,ω) =⇒ (sinv(D, Eˆ, par)⇔ sinv(D′, Eˆ, par))
We will state the concurrent simulation theorem for two machines with initial configurations
D ∈ CSc and E ∈ CSa that are coupled via csim(D.M,E, par) and sinv(D,E, par). However,
it is also typical to require some conditions on the initial configuration of the machine imple-
menting the abstract one. We resolve this issue, not considered in [Bau14b], with the help of an
additional invariant for the units on the concrete level.
Definition 2.45 (Concrete Machine Unit Invariant). We introduce a concrete machine unit in-
variant that may require some conditions on the unit’s configuration and couple it with the
ownership configuration if needed:
uinvSaSc : Uc × 2Ac × 2Ac → B
For a unit p ∈ Nnu and a Cosmos machine configuration D ∈ CSc we give the shorthand
uinvp(D) ≡ uinvSaSc (D.up, D.Op, D.S)
Moreover, we demand the given invariant to hold only for units being in consistency points:
uinv(D, par) ≡ ∀p ∈ Ucp(D.M, par). uinvp(D)
As an example of a simulation needed such an invariant, one could consider a concrete ma-
chine with units buffering memory write accesses though this is invisible on the abstract level.
A simple solution allowing the simulation might require the buffers of all units to contain only
their local owned addresses.
Cosmos Model Simulation Theorem
Now, having the sequential simulation framework RSaSc , shared invariant sinv
Sa
Sc
, unit invari-
ant uinvSaSc if needed, and a property PSa : CSa → B restricting the ownership safety policy in a
desired way, we can state the main Cosmos model simulation theorem.
Theorem 2.4 (Cosmos Model Simulation Theorem). For any starting configurationsD ∈ CSc and
E ∈ CSa of the concrete and abstract Cosmos machines respectively, if
• there exist units that are in consistency points in D, and all such units have well-formed configu-
rations and obey the unit invariant,
• all units in E are in consistency points,
• the shared invariant between D and E holds, and the units in consistency point in D are coupled
with E via the simulation relation wrt. the simulation parameter par and the ownership state of
the abstract machine, and
35
2 Model for Concurrent Systems and Simulation
• if the execution of the abstract machine starting in E is verified wrt. the software conditions, well-
formedness, ownership safety policy, and the property PSa restricting it,
then for any suitable consistency block machine schedule κ ∈ (Θ∗Sc)∗ with non-empty blocks executable
from D.M , there exists a complete block machine schedule ν ∈ (Θ∗Sa)∗ equivalent to κ according to
ν
sch∼ κ and executable from E.M such that
• the computation (E.M, ν) leads to well-formed configurations of all units in the abstract machine
and is simulated by the well-behaved computation (D.M, κ) ending in a configuration with well-
formed units in consistency points, and
• for any ownership annotation that is safe for (E.M, ν) wrt. the property PSa we guarantee that
the resulting configurations of both machines are coupled with the concurrent simulation relation.
• Moreover, we can transfer the ownership safety to the implementation layer such that the unit
invariant in consistency points as well as the shared invariant between the abstract and concrete
Cosmos machines are preserved at the end of the computations.
∀D ∈ CSc , E ∈ CSa , par ∈ P, κ ∈ (Θ∗Sc)∗.
(i) ∃p ∈ Nnu . CPp(D.M, par)
(ii) uinv(D, par) ∧ ∀p ∈ Ucp(D.M, par). wf p(D.M)
(iii) ∀p ∈ Nnu. CPp(E.M, par)
(iv) csim (D.M,E, par) ∧ sinv(D,E, par)
(v) safetycB (E,PSa , par) ∧ scwf cB (E.M, par)
(vi) CPsched(κ, par) ∧ suit (bκc) ∧ comp(D.M, bκc) ∧ ∀ω ∈ κ. ω 6= ε
=⇒
∃ν ∈ (Θ∗Sa)∗,M ′e ∈MSa ,M ′d ∈MSc .
(i) CPschedc(E.M, ν, par) ∧ ν sch∼ κ
(ii) E.M ν7−→M ′e ∧ ∀p ∈ Nnu . wf p(M ′e)
(iii) D.M κ7−→M ′d ∧ wb(D.M, bκc, par) ∧ ∀p ∈ Ucp(M ′d, par). wf p(M ′d)
(iv) ∀oν ∈ (ΩSa)∗,G′e ∈ GSa .
E
〈bνc,oν〉7−→ (M ′e,G′e) ∧ safecB (E,PSa , ν, oν) =⇒
(a) csim (M ′d, (M
′
e,G′e), par)
(b) ∃oκ ∈ (ΩSc)∗,G′d ∈ GSc .
(b.i) D
〈bκc,oκ〉7−→ (M ′d,G′d) ∧ safe(D, 〈bκc, oκ〉)
(b.ii) uinv((M ′d,G′d), par)
(b.iii) sinv((M ′d,G′d), (M ′e,G′e), par)
As we see in the theorem, a possibly incomplete consistency block machine execution of Sc
implements a complete consistency block computation of Sa. For units that have not reached
the consistency points we do not require the simulation to hold. However, for such intermediate
states the shared invariant is preserved.
We treat the incomplete blocks depending whether they contain IO-steps or not. If such a
36
2.3 System Simulation in Concurrency
block contains only local steps, it cannot change the shared memory and the ownership state.
Therefore, we omit it in the abstract computation and represent it there as an empty step. Oth-
erwise, we have to reflect the same changes in the abstract model. In this case, applying the
general sequential simulation theorem we find a corresponding complete block of the abstract
machine.
Before the proof of the theorem we need to state a few assumption additional to the sequential
simulation theorem. The assumptions basically require that we can transfer the ownership
safety to the concrete level, and when a particular unit is stepping, it preserves the simulation
for other units and does not destroy their properties.
Assumption 2.1 (Safety Transfer with sinv and uinv Preservation for a Stepping Unit). For
any well-behaved complete consistency block computation (D.M, σ) of a unit p starting in a Cosmos
machine configuration D where the unit invariant is obeyed, and implementing a corresponding compu-
tation (E.M, τ) of the abstract machine such that
• the shared invariant between E and D holds,
• the abstract block computation is safe wrt. a given ownership annotation and the property PSa ,
• and the unit p in both machines is coupled with the sequential simulation relation before and after
the computations,
we can find an ownership annotation for σ, such that the annotated concrete computation is ownership
safe and preserves the shared and unit invariants.
∀D ∈ CSc , d′ ∈MSc , E,E′ ∈ CSa , σ ∈ Θ∗Sc , τ ∈ Θ∗Sa , oτ ∈ Ω∗Sa , p ∈ Nnu, par ∈ P.
(i) D.M σ7−→ d′ ∧ blk(σ, p) ∧ oneIO(σ, τ) ∧ wb(D.M, σ, par)∧
wf p(d
′) ∧ uinvp(D)
(ii) E
〈τ,oτ 〉7−→ E′ ∧ blk(τ, p) ∧ PSa(E) ∧ safePSa (E, 〈τ, oτ 〉)∧
sc(E.M, τ, par) ∧ wf p(E′.M)
(iii) csimp(D.M,E, par) ∧ sinv(D,E, par) ∧ csimp(d′, E′, par)
=⇒
∃oσ ∈ (ΩSc)∗,G′ ∈ GSc .
(i) D
〈σ,oσ〉7−→ (d′,G′) ∧ safe(D, 〈σ, oσ〉) ∧ uinvp((d′,G′))
(ii) sinv((d′,G′), E′, par)
Note, that in contrast to the same assumption in [Bau14b], along with the introduction of the
unit invariant, we additionally assume that the well-formedness predicates for both machines
hold at the end of the consistency block computations. This extension is needed for the proof
of the shared invariant that may guarantee well-formedness properties of the shared memories
coupled between two machines. Moreover, the steps of the abstract machine have to be save
wtr.PSa .
As we know according to Lemma 2.5 any ownership safe steps of a unit do not change other
units’ configurations. We use this fact to make the assumptions about the property and simula-
tions preservation for other units present in the considered machines.
37
2 Model for Concurrent Systems and Simulation
Assumption 2.2 (Preservation of Well-Formedness for Other Units of Abstract Machine). The
well-formedness predicate of the abstract machine for a given unit depends only on the local state of this
unit and the memory covered by the shared invariant.
∀E,E′ ∈ CSa , p ∈ Nnu, par ∈ P.
(i) wf p(E.M) ∧ E ≈p E′
(ii) sinv(D′, E′, par)
(iii) oinv(E) ∧ PSa(E) ∧ oinv(E′) ∧ PSa(E′)
=⇒ wf p(E′.M)
This assumption differs from the same one stated in [Bau14b] by additional PSa and oinv al-
lowing to derive needed properties about owned and shared components and the memory in
configurations E and E′.
Assumption 2.3 (Preservation of Well-Formedness for Other Units of Concrete Machine).
Analogously to Assumption 2.2, the well-formedness predicate of the concrete machine depends only on
the local state of its units and the memory covered by the shared invariant.
∀D,D′ ∈ CSc , E,E′ ∈ CSc , p ∈ Nnu, par ∈ P.
(i) wf p(D.M) ∧D ≈p D′
(ii) csimp(D.M,E, par) ∧ sinv(D,E, par) ∧ sinv(D′, E′, par)
(iii) oinv(E) ∧ PSa(E) ∧ oinv(E′) ∧ PSa(E′)
=⇒ wf p(D′.M)
In contrast to [Bau14b] this assumption contains in its premises not only PSa and oinv for the
abstract machine, but also the simulation relation for the well-formed sequential machine con-
figuration and the shared invariant.
Assumption 2.4 (Preservation of Simulation Relation and uinv for Other Units). The sequential
simulation relation and the unit invariant for a unit p may depend on the unit’s configuration, read-only,
shared, and owned by p memories as well as its ownership state and the property PSa adjusting it.
∀D,D′ ∈ CSc , E,E′ ∈ CSa , p ∈ Nnu, par ∈ P.
(i) csimp(D.M,E, par) ∧ uinvp(D)
(ii) sinv(D,E, par) ∧ PSa(E) ∧ oinv(E) ∧ oinv(D)
(iii) sinv(D′, E′, par) ∧ PSa(E′) ∧ oinv(E′) ∧ oinv(D′)
(iv) E ≈p E′ ∧D ≈p D′
=⇒
csimp(D
′.M,E′, par) ∧ uinvp(D′)
Note again that this assumption differs from [Bau14b] similarly to Assumption 2.3 considered
above and gives more flexibility for the simulation relation and properties to be instantiated.
38
2.3 System Simulation in Concurrency
Though it might be more clear to talk about the well-formedness and the unit invariant preser-
vation for the concrete machine independently from the abstract machine, in this case one
would have to transfer PSa explicitly to some properties on the concrete level in a general way,
what is not required by the concurrent simulation theorem and would make it less intuitive.
Such explicit property transfer will be shown separately later in this chapter. Moreover, in this
thesis we will see how one can use the simulation and these assumptions for different systems.
Proof of Cosmos Model Simulation Theorem: We show the claim by induction on the length
of consistency block schedule n = |κ|.
Base case (n = 0): For |κ| = ε we set ν = ε, M ′e = E.M , and M ′d = D.M . The claims (i)–(iii)
obviously follow from the premises. For the claim (iv) we can only consider oν = ε and G′e =
E.G, what gives us the concurrent simulation relation forD.M andE. Applying Assumption 2.1
for d′ = D, E′ = E, σ = τ = oτ = ε and the theorem premises we also easily conclude the rest
of the theorem statement. Note, that in this case the ownership safety on the concrete level boils
down to safe(D, ε) = oinv(D.G) which follows directly from Assumption 2.1.
Induction hypothesis (IH): We assume the claim holds for any consistency block machine
schedule κ¯with a fixed length n = m. Thus, there exists a simulated complete consistency block
machine computation (E.M, ν¯) such that all the desired properties hold. Moreover, (D.M, κ¯) is
safe wrt. ownership.
Induction step (m→ m+ 1): Assume that we are given a block machine computation (D.M, κ)
with |κ| = m+1. We denote the last block by ω = κm+1 and the previous blocks by κ¯ = κ[1 : m],
i.e., κ = κ¯ω. Using the hypothesis of the theorem for κ for the induction hypothesis and setting
up ν for the first m blocks such that ν[1 : m] = ν¯ we get the m-step consistency block machine
computations (D.M, κ¯) and (E.M, ν¯) leading to the machine states M¯d and M¯e respectively and
fulfilling the claims of the simulation theorem:
CPschedc(E.M, ν[1 : m], par) ∧ ν[1 : m] sch∼ κ[1 : m] (2.1)
E.M
ν[1:m]7−→ M¯e ∧ ∀p ∈ Nnu . wf p(M¯e) (2.2)
D.M
κ[1:m]7−→ M¯d ∧ wb(D.M, bκ[1 : m]c, par) ∧ ∀p ∈ Ucp(M¯d, par). wf p(M¯d) (2.3)
∀oν¯ ∈ (ΩSa)∗, G¯e ∈ GSa . (2.4)
E
〈bν[1:m]c,oν¯〉7−→ (M¯e, G¯e) ∧ safecB (E,PSa , ν[1 : m], oν¯) =⇒
(a) csim
(
M¯d, (M¯e, G¯e), par
)
(b) ∃oκ¯ ∈ (ΩSc)∗, G¯d ∈ GSc .
(b.i) D
〈bκ[1:m]c,oκ¯〉7−→ (M¯d, G¯d) ∧ safe(D, 〈bκ[1 : m]c, oκ¯〉)
(b.ii) uinv((M¯d, G¯d), par)
(b.iii) sinv((M¯d, G¯d), (M¯e, G¯e), par)
Therefore we only need to take care of the probably incomplete last block ω which is non-
empty by the hypothesis and being executed by some unit q = ω1.s. From the theorem hypoth-
esis (vi) we know that the final machine state for the block computation (D.M, κ) exists and we
denote it d′. Moreover, D.M
κ[1:m]7−→ M¯d leads to the intermediate state M¯d, so that we also have
M¯d
ω7−→ d′.
To apply the generalized sequential simulation theorem for the computation (M¯d, ω) and
the abstract machine configuration M¯e one needs to choose for the unit q the set of memory
39
2 Model for Concurrent Systems and Simulation
addresses excluded from its simulation relation and not accessed by the unit during this block
execution. The choice of this memory region depends on the simulation relation holding at the
beginning of execution of the block ω and addresses not accessed by the unit of the abstract
machine for any consistency block starting from M¯e.
From the theorem hypothesis (v) safetycB (E,PSa , par) and existing computation E.M
ν[1:m]7−→
M¯e we know that there exist ownership annotations oν¯ such that transitions E
〈bν[1:m]c,oν¯〉7−→
(M¯e, G¯e) lead to corresponding ownership states G¯e and the simulation relation for both ma-
chines holds by induction hypothesis (2.4a):
∀p ∈ Ucp(M¯d, par). csimp(M¯d, (M¯e, G¯e), par).
From the theorem hypothesis (vi) CPsched(κ[1 : m]ω, par) the unit q is obviously at a consis-
tency point in configuration M¯d, therefore
simq
(
M¯d, M¯e, par,SO(G¯e, q)
)
So, the sequential simulation theorem applied to the considered case with icm = SO(G¯e, q)
gets the form:
(i) wf q(M¯d) ∧ wf q(M¯e)
(ii) simq(M¯d, M¯e, par , icm) ∧ CPq(M¯d, par) ∧ CPq(M¯e, par)
(iii) ∀λ ∈ Θ∗Sa , eˆ ∈MSa . M¯e
λ7−→ eˆ ∧ blk(λ, p) ∧ CPp(eˆ, par) =⇒
noacc(M¯e, λ, icm) ∧ sc(M¯e, λ, par) ∧ wf p(eˆ)
(iv) ω 6= ε ∧ blk(ω, p) ∧ suit(ω) ∧ comp(M¯d, ω)
=⇒
∃σ ∈ Θ∗Sc , d′′ ∈MSc , τ ∈ Θ∗Sa , e′′ ∈MSa .
(i) ω Bblkq σ ∧ suit(σ)
(ii) blk(τ, q) ∧ oneIO(σ, τ)
(iii) M¯d
σ7−→ d′′ ∧ wb(M¯d, σ, par) ∧ wf q(d′′)
(iv) M¯e
τ7−→ e′′ ∧ wf q(e′′)
(v) simq(d′′, e′′, par , icm) ∧ CPq(d′′, par) ∧ CPq(e′′, par)
The premises (i)−(ii) of the applied sequential simulation theorem follow directly from the
equations (2.1) – (2.4) and the fact that the unit q is at a consistency point in M¯d what is known
from CPsched(κ, par). Since the simulation relation for q for the chosen icm holds as proven
above and by (2.1) the unit is also at a consistency point in M¯e, one trivially gets (ii). We
also easily conclude the premise (iv) from the hypothesis (vi) of the Cosmos model simulation
theorem to be proven for κ = κ¯ω.
To prove premise (iii) we use the theorem hypothesis (v). So, from safetycB (E,PSa , par) for
all such existing complete consistency block machine computations (M¯e, λ) of unit q leading
to eˆ there exist ownership safe annotations oλ such that computations (M¯e, G¯e) 〈λ,oλ〉7−→ (eˆ, Gˆe) till
some Gˆe are feasible. Due to the safe access and ownership transfer policy during 〈λ, oλ〉 the
unit q does not access local owned addresses SO(G¯e, q) of other units and cannot modify them.
This fact follows from the inductive application of Lemma 2.5 for the steps of 〈λ, oλ〉. Therefore,
for such (M¯e, λ) one guarantees that noacc(M¯e, λ,SO(G¯e, q)) holds. From scwf cB (E.M, par) we
easily conclude the rest of the premise (v).
40
2.3 System Simulation in Concurrency
Now we can use the claim of the applied sequential simulation theorem. We extend the block
ω by steps ω′ in order to obtain σ = ωω′. Thus the sequential simulation guarantees for the
unit q the existence of computations M¯e
τ7−→ e′′ and M¯d ω7−→ d′ ω
′
7−→ d′′ such that the simulation
relation excluding SO(G¯e, q) holds. Moreover, we choose ownership annotations oτ such that
computations (M¯e, G¯e) 〈τ,oτ 〉7−→ (e′′,G′′e ) for corresponding G′′e are safe wrt. the ownership and the
property PSa , i.e., PSa((M¯e, G¯e)) ∧ safePSa ((M¯e, G¯e), 〈τ, oτ 〉) holds.
First, we prove our theorem for ν[1 : m]τ and κ[1 : m]σ. As the second step, we will look at
ω′ because the theorem is originally stated for non-extended ω.
Using equation (2.1) we easily conclude claim (i) of the theorem, namely
CPschedc(E.M, ν[1 : m]τ, par) ∧ ν[1 : m]τ sch∼ κ[1 : m]σ,
because τ is a complete consistency block of the unit q, oneIO(σ, τ) holds, all units were at
consistency points in M¯e, and predicate CP depends only on a unit’s machine configuration
and read-only memory that were not changed for units p 6= q during safe steps of q in τ .
By now, from equations (2.2) – (2.3) and the sequential simulation theorem we also have
D.M
κ[1:m]σ7−→ d′′ wf q(d′′) wb(D.M, bκ[1 : m]σc, par)
E.M
ν[1:m]τ7−→ e′′ wf q(e′′) simq(d′′, e′′, par ,SO(G¯e, q))
Moreover, as mentioned before, by Lemma 2.5 for the safe computation (M¯e, G¯e) 〈τ,oτ 〉7−→ (e′′,G′′e )
we have SO(G¯e, q) = SO(G′′e , q) and conclude simq(d′′, e′′, par ,SO(G′′e , q)).
To prove the rest of the claims we first apply Assumption 2.1 for τ and σ:
(i) M¯d
σ7−→ d′′ ∧ blk(σ, q) ∧ oneIO(σ, τ) ∧ wb(M¯d, σ, par) ∧ wf q(d′′) ∧ uinvq((M¯d, G¯d))
(ii) (M¯e, G¯e) 〈τ,oτ 〉7−→ (e′′,G′′e ) ∧ blk(τ, q) ∧ PSa((M¯e, G¯e))∧
safePSa ((M¯e, G¯e), 〈τ, oτ 〉) ∧ sc(M¯e, τ, par) ∧ wf q(e′′)
(iii) simq
(
M¯d, M¯e, par,SO(G¯e, q)
) ∧ sinv((M¯d, G¯d), (M¯e, G¯e), par)∧
simq(d
′′, e′′, par ,SO(G′′e , q))
=⇒
∃oσ ∈ (ΩSc)∗,G′′d ∈ GSc .
(i) (M¯d, G¯d) 〈σ,oσ〉7−→ (d′′,G′′d ) ∧ safe((M¯d, G¯d), 〈σ, oσ〉) ∧ uinvq((d′′,G′′d ))
(ii) sinv((d′′,G′′d ), (e′′,G′′e ), par)
Obviously, from (2.4) the shared invariant holds and we have uinvq((M¯d, G¯d)) because the unit q
is in a consistency point. All the other premises follow directly from the argumentation above.
From now we can consider the ownership safe computation (M¯d, G¯d) 〈σ,oσ〉7−→ (d′′,G′′d ) with the
ownership annotation oσ . Moreover, we may concatenate it with oκ¯ so that we easily get
D
〈bκ¯σc,oκ¯oσ〉7−→ (d′′,G′′d ) ∧ safe(D, 〈bκ¯σc, oκ¯oσ〉)
The rest to be proven is the preservation of well-formedness, the unit invariant and the se-
quential simulation relation for other units p 6= q.
41
2 Model for Concurrent Systems and Simulation
Applying Lemma 2.5 for the steps of (M¯e, G¯e) 〈τ,oτ 〉7−→ (e′′,G′′e ) and (M¯d, G¯d)
〈σ,oσ〉7−→ (d′′,G′′d ) we
get
∀p 6= q. (M¯e, G¯e) ≈p (e′′,G′′e ) ∧ (M¯d, G¯d) ≈p (d′′,G′′d )
From (2.4) (a) and (b.ii) we know
∀p ∈ Ucp(M¯d, par). simp
(
M¯d, M¯e, par,SO(G¯e, p)
) ∧ uinvp((M¯d, G¯d))
Naturally, the ownership states G′′d and G′′e support the ownership invariant oinv by Lemma 2.1.
Moreover, the shared invariant holds at the beginning and the end of the considered computa-
tions.
Therefore, since we have ∀p ∈ Nnu . wf p(M¯e) and ∀p ∈ Ucp(M¯d, par). wf p(M¯d) in equations
(2.2) – (2.3), we use Assumptions 2.2 and 2.3 for both machines in order to conclude the well-
formedness of units p 6= q in e′′ and d′′.
Finally, we show that the simulation relations simp
(
d′′, e′′, par,SO(G′′e , p)
)
and unit invari-
ants uinvp((d′′,G′′d )) for other units p ∈ Ucp(d′′, par) with p 6= q are preserved. Applying As-
sumption 2.4 for each unit p we finish the proof and conclude that the theorem claims hold for
ν[1 : m]τ and κ[1 : m]σ.
Now, for the computation M¯d
ω7−→ d′ ω
′
7−→ d′′ we split cases on ω′ and prove the theorem for
the computation (D.M, κ) with κm+1 = ω leading to configuration M ′d = d
′.
Case 1: ω′ = ε. In this case ω is already complete, d′ = d′′, and we just set ν = ν¯τ , M ′e = e′′ for
the theorem to hold.
Case 2: ω′ 6= ε ∧ ω|io 6= ε. The consistency block ω contains already an IO-operation and is
incomplete. In this case we include τ into the abstract model computation (E.M, ν) leading to
M ′e = e
′′ because one has to maintain the shared invariant.1
Since prefixes of sequences cannot contain more IO-points than the original ones, the predi-
cate oneIO(ω) holds, and we show claim (i) of the theorem as in the previous case.
Obviously, from the sequential simulation theorem we already know wf q(e′′). For the unit q
of the concrete machine we do not have to guarantee the well-formedness and the simulation
relation because it has not reached a consistency point after the computation (D.M, k). From
wb(D.M, bκ[1 : m]σc, par) proven before we also get wb(D.M, bκc, par).
Having safe((M¯d, G¯d), 〈σ, oσ〉) we can split oσ = oωo′ω such that |oω| = |ω|, |o′ω| = |ω′|. There-
fore, it follows that for some intermediate ownership state G′d one has
safe((M¯d, G¯d), 〈ω, oω〉) safe((d′,G′d), 〈ω′, o′ω〉)
We also know ω′|io = ε because of oneIO(ωω′) and ω|io 6= ε. Applying Lemma 2.9 for the
computation (d′,G′d)
〈ω′,o′ω〉7−→ (d′′,G′′d ) with sinv((d′′,G′′d ), (e′′,G′′e ), par) we get
sinv((d′,G′d), (e′′,G′′e ), par)
Now, similarly to the previous case it is easy to apply Assumptions 2.2–2.4 to show the preser-
vation of the well-formedness, simulation relations, unit invariants for the units p 6= q still being
in consistency points in e′′ and d′.
Case 3: ω′ 6= ε∧ω|io = ε. The incomplete consistency block ω contains only local steps of the unit
q. In this case we cannot include τ into the simulated computation (E.M, ν) because it could
1As a special case for ω|io 6= ε one could consider ω|ot 6= ε. According to oneIO(ωω′, τ) we allow τ |io = τ |ot = ε.
Therefore, for τ 6= εwe could have set νm+1 = ε instead of τ . However, since it would require more argumentation
based on technical lemmas from [Bau14b] not covered here, we stick to the shorter proof with νm+1 = τ .
42
2.4 Order Reduction on the Concrete Level
contain an IO-step introduced only in ω′ that is not executed, and change the shared memory
state and the ownership so that the shared invariant would not hold after the computations.
Thus, we set νm+1 = ε and obviously M ′e = M¯e, G′e = G¯e.
Equation (2.2) gives us ∀p ∈ Nnu . wf p(M¯e). Moreover, from safe((M¯d, G¯d), 〈ω, oω〉), ω|io = ε,
sinv((M¯d, G¯d), (M¯e, G¯e), par) and Lemma 2.9 we also have
sinv((d′,G′d), (M¯e, G¯e), par)
The argumentation about the remaining parts of the claims is analogous to the previous case.
This finishes the proof of the theorem.
2.4 Order Reduction on the Concrete Level
To apply the Cosmos model simulation theorem on a arbitrary interleaved sequence of the con-
crete model computations starting from D, one has to apply first the order reduction from the
arbitrary schedule to the block schedule that will allow us to guarantee that the ownership
safety as well as any other properties PSc : CSc → B verified on the block schedule hold for the
given arbitrary interleaved schedule.
Remember that the order reduction theorem states the safety transfer from safe IP-schedules
to arbitrary interleaved Cosmos machine schedules, and requires in the hypotheses that all IP-
schedule computations leaving D are safe and obey the IOIP condition. To prove these we
would like to use the Cosmos model simulation theorem. However, the theorem guarantees
such properties only for all block schedules that are suitable for the simulation. This makes
the direct application of the order reduction theorem impossible. Instead, we have to state the
analogous order reduction theorem but for suitable schedules. We proceed in the way shown
by Baumann.
2.4.1 Order Reduction for Suitable Schedules
We redefine the predicates from Section 2.2:
Definition 2.46 (Verified Cosmos model for Suitable Schedules). For θ ∈ Θ∗Sc , o ∈ Ω∗Sc , config-
uration D ∈ CSc and safety property PSc the predicates
safety(D,PSc , suit)
def≡ ∀θ. suit(θ) ∧ comp(D.M, θ) =⇒ ∃o ∈ Ω∗Sc . safePSc (D, 〈θ, o〉)
safetyIP(D,PSc , suit)
def≡ ∀θ. IPsched(θ) ∧ suit(θ) ∧ comp(D.M, θ) =⇒
∃o ∈ Ω∗S . safePSc (D, 〈θ, o〉)
IOIPIP(D, suit) def≡ ∀θ. IPsched(θ) ∧ suit(θ) ∧ comp(D.M, θ) =⇒ IOIP(θ)
denote the safety and IOIP condition for all IP-schedules suitable for the simulation.
Being based on the previously shown lemmas from Section 2.2 and a property from [Bau14b]
claiming that the equivalent reordering preserves the suitability, Baumann managed to show
a stronger reduction theorem that allows to transfer safety properties from a subset of IP-
schedules suitable for Cosmos model simulation to suitable arbitrarily interleaved schedules.
43
2 Model for Concurrent Systems and Simulation
Theorem 2.5 (IP-Schedule Order Reduction for Suitable Schedules). Given a simulation frame-
work RSaSc and a Cosmos model configuration D ∈ CSc for which it has been verified that all suitableIP-schedules originating in D are safe wrt. ownership and a Cosmos machine property PSc . Moreover,
all suitable IP-schedule computations running out of D obey the IOIP condition. Then ownership
safety and PSc hold on all computations with a schedule suitable for simulation that starts in D.
safetyIP(D,PSc , suit) ∧ IOIPIP(D, suit) =⇒ safety(D,PSc , suit)
2.4.2 Applying the Cosmos Model Simulation Theorem
Since we can justify verification of ownership safety and other properties of the Cosmos machine
states for suitable schedules, we would like to see now whether not only the ownership safety
but also the well-formedness and well-behaviour of the concrete machine guaranteed by the
Cosmos model simulation theorem for suitable block schedule computations can be transferred
to any arbitrary interleaved suitable step sequences.
In comparison to the well-formedness and unit invarints that are properties of a machine
state, the well-behaviour is a property of steps. Baumann resolved this issue by extending the
considered concrete Cosmos model Sc with some history information recording whether the
well-behaviour is violated or not, and showed that using the Cosmos model simulation theorem
for the original abstract and concrete models as well the aforementioned model extension one
guarantees that the proven well-behaviour transfers to arbitrary interleaved computations of
the extended concrete machine.
However, such an extension implies some restrictions on the step properties because of the
specific semantics of Cosmos machines. Recall, that the Cosmos machine transition function is
based on the unit’s transition function that takes as a parameter a portion of memory restricted
to the reads-set. If in any step of the machine we would recompute the history flag about the
well-behaviour, it would be possible only if the well-behaviour does not talk about the memory
content at addresses not belonging to the reads-set. This fact was not mentioned in the original
work [Bau14b] and can be easily noticed if one looks closer how such an extension could be
made.
To see this we extend in detail our concrete model Sc to S′c in the way introduced in Bau-
mann’s work:
• For components X ∈ {A,V,R,nu, E} we have
S′c.X = Sc.X
• The unit configuration is extended with the history information corresponding to the well-
behaviour.
S′c.U = (k ∈ Sc.U , wb ∈ B)
The flag wb is computed in every step of the machine and initially will be equal to 1.
• The reads-set is equal to the one from the original model
S′c.reads(u,m, in) = Sc.reads(u.k,m, in)
• For components Y ∈ {IP, IO,OT }we have
S′c.Y (u,m, in) = Sc.Y (u.k,m, in)
44
2.4 Order Reduction on the Concrete Level
• The unit’s transition function additionally records the history of well-behaviour
S′c.δ(u,m, in) =
{
((k′, wb′) ,m′) : Sc.δ(u.k,m, in) = (k′,m′)
undefined : otherwise
where wb′ = u.wb ∧ RSaSc .wb ((u, dme), in, par). Since m is the memory restricted by the
reads-set and the predicate wb might state properties on values at other addresses, one
has to restrict such properties only to the reads-set.
Hence, to be able to transfer the well-behaviour to arbitrary interleaving schedules of the ex-
tended machine computation we make the following assumption.
Assumption 2.5 (Well-Behaviour Restriction). For a simulation framework RSaSc , a unit configura-
tion u ∈ Uc, memory m : Ac → Vc, an input in ∈ Ec and a simulation parameter par ∈ P we require
that the well-behaviour property of a computation step does not depend on memory content outside the
reads-set R = Sc.reads(u,m, in) of the concrete model Sc:
RSaSc .wb ((u,m), in, par)⇐⇒
(
∀m′. m′|R = m|R =⇒ RSaSc .wb ((u,m′), in, par)
)
where m′ : Ac → Vc.
Given the extended Cosmos model S′d, the transformation of the simulation framework R
Sa
Sc
,
sinvSaSc , and uinv
Sa
Sc
into the framework for the abstract and extended concrete machines RSaS′c
and the corresponding predicates for S′c is a bookkeeping and we skip the details here.
Now, for a configuration D ∈ CS′c we can define a Cosmos machine safety property denoting
the well-behaviour in the past before D:
W (D)
def≡ ∀p ∈ Nnu. D.up.wb
Definition 2.47 (Simulation Hypothesis). Finally, we introduce the predicate simh to denote
the simulation hypotheses of the concurrent simulation theorem for the framework RSaS′c , start
configurations D ∈ CS′c , E ∈ CSa , and a simulation parameter par ∈ P . Moreover, additionally
to the safety property PSa needed for the concurrent simulation in Theorem 2.4, we include any
other property P ′Sa : CSa → B involved into the property transfer from the abstract machine to
the concrete one (considered below in Section 2.5).
Using the shorthand
P˜Sa(e) ≡ PSa(e) ∧ P ′Sa(e)
for e ∈ CSa we formally define
simh(D,E, P ′Sa , par)
def≡ (i) uinv(D, par) ∧ ∀p ∈ Ucp(D.M, par). wf p(D.M)
(ii) ∃p ∈ Nnu . CPp(D.M, par) ∧W (D)
(iii) csim(D.M,E, par) ∧ sinv(D,E, par)
(iv) safetycB (E, P˜Sa , par) ∧ scwf cB (E.M, par)
(v) IPCP(RSeS′c , par)
Baumann in his work proved two corollaries of the concurrent simulation theorem that allow
us to apply the order reduction on the concrete level.
45
2 Model for Concurrent Systems and Simulation
Corollary 2.2 (Simulating IP-Schedules are Safe and Well-Behaved). Assuming a pair of simu-
lating Cosmos machine configurations D ∈ CS′d and E ∈ CSe , if the simulation hypotheses hold for the
simulation framework RSeS′c , the corresponding predicates, and some parameter par ∈ P , then all suitableIP-schedule computations running out of D are ownership-safe and well-behaved.
∀D,E, P ′Sa , par . simh(D,E, P ′Sa , par) =⇒ safetyIP(D,W, suit)
Corollary 2.3 (IOIP Assumption for Simulating IP-Schedules). Given a Cosmos machine sim-
ulation constrained as in the previous corollary with start configuration D, then every IP-schedule that
is suitable for the simulation and running out of D fulfills the IOIP condition.
∀D,E, P ′Sa , par . simh(D,E, P ′Sa , par) =⇒ IOIPIP(D, suit)
Therefore, applying these corollaries and Theorem 2.5 we easily reach our goal.
Corollary 2.4 (Ownership Transfer for Simulating Block Machines). Assuming a pair of simu-
lating Cosmos machine configurations D ∈ CS′c and E ∈ CSe , if the simulation hypotheses hold for
the simulation framework RSeS′c , corresponding predicates, and some parameter par ∈ P , then all suitable
computations running out of D are ownership-safe and well-behaved.
∀D,E, P ′Sa , par . simh(D,E, P ′Sa , par) =⇒ safety(D,W, suit)
Note, that among the well-behavior we could also easily transfer the well-formedness and
unit invariant in the same manner. Therefore, we have shown that verification of considered
properties for block schedules is justified. In other words, the order reduction applied on the
concrete level does not influence these properties.
2.5 Property Transfer From Abstract to Concrete Level
For constructing concurrent model stacks for pervasive verification one often needs to transfer
the software conditions and safety policy from the abstract level down to the concrete machine
computations which can be treated as an abstract level of another underlying simulation layer.
For instance, we have already seen that the required well-behavior can follow from the software
conditions of the abstract model and guaranteed by the sequential simulation theorem. Now,
having the Cosmos model simulation theorem between Se and S′c with the framework R
Sa
S′c
, in-
variants sinvSaSc , uinv
Sa
Sc
, the property PSa as well as any additional property P ′Sa of the abstract
machine, we can also show how we transfer P˜Sa (see Definition 2.47) to some properties on the
concrete level. To reach this goal we proceed here with an approach suggested by Baumann in
his work and based on ideas from [CL98].
Since generally we are dealing with different machines Sa and S′c, we cannot translate one-to-
one the abstract property P˜Sa into some property of the concrete machine. Moreover, we require
that P˜Sa holds at the beginning and the end of all complete consistency block executions for the
abstract machine. To be able to transfer the aforementioned property to the concrete level, one
needs to couple the configurations of both machines by the simulation relation. However, it
46
2.5 Property Transfer From Abstract to Concrete Level
is not possible in the context of the incomplete consistency block execution. For the units not
being in consistency points we only require the shared invariant to hold. This has an influence
on the sort of properties that could be transferred. Baumann suggested to distinguish between
local and global properties and introduced a requirement that a property to be transferred must
be divisible.
Definition 2.48 (Divisible Cosmos machine Safety Property). We say that P : CS → B is
a divisible safety property of a Cosmos model S iff for any configuration C ∈ CS it has the
following structure
P (C) = Pg(C) ∧ ∀p ∈ Nnu. Pl(C, p)
where Pg is a global property which depends only on shared resources and the ownership state,
and Pl constitutes local properties for each unit of the system. Consequently they are constrained
as shown below for any C,C ′ ∈ CS :
C
s∼ C ′ ∧ C o∼ C ′ =⇒ Pg(C) = Pg(C ′)
∀p. C ≈p C ′ =⇒ Pl(C, p) = Pl(C ′, p)
Therefore, we have to translate global properties in all intermediate configurations and for
proving them we can use the shared invariant. In turn, local properties can only be used in
consistency points where the simulation relation holds.
Definition 2.49 (Simulated Cosmosmachine Property). Let P˜Sa from Definition 2.47 be a divis-
ible Cosmos machine safety property on CSa in the simulation between two Cosmos models Sa
and S′c. Then for a given simulation parameter par ∈ P a simulated divisible Cosmos machine
property Qˆ[P˜Sa , par ] : CS′c → B can be derived by solving the following formula2, which states
for any configuration E ∈ CSa being completely consistent with D ∈ CS′c that Qˆ[P˜Sa , par ] holds
in D iff P˜Sa holds in E.
∀D,E. (sinv(D,E, par) ∧ ∀p. csimp(D.M,E, par)) =⇒ Qˆ[P˜Sa , par ](D) = P˜Sa(E)
Consequently the following constraints must hold for Qˆ[P˜Sa , par ]:
∀D,E. Pg(E) ∧ sinv(D,E, par) =⇒ Qˆ[P˜Sa , par ]g(D)
∀D,E, p. Pl(E, p) ∧ csimp(D.M,E, par) =⇒ Qˆ[P˜Sa , par ]l(D, p)
where Pg , Pl, Qˆ[P˜Sa , par ]g , Qˆ[P˜Sa , par ]l are global and local properties of P˜Sa and Qˆ[P˜Sa , par ]
respectively according to the Definition 2.48.
Additionally, we relax the definition of the simulated properties to so called incompletely sim-
ulated Cosmos machine properties because not all units could be in their consistency points.
Definition 2.50 (Incompletely Simulated Cosmos machine Property). Given the divisible sim-
ulated Cosmos machine property Qˆ[P˜Sa , par ] from Definition 2.49 we define an incompletely
simulated Cosmos machine property Q[P˜Sa , par ] : CS′c → B in the following way:
Q[P˜Sa , par ](D)
def≡ Qˆ[P˜Sa , par ]g(D) ∧ ∀p ∈ Ucp(D.M, par). Qˆ[P˜Sa , par ]l(D, p)
Finally, we can show that P˜Sa is translated to the incomplete simulated property maintained
on the concrete level. This fact is stated in the following theorem proven in [Bau14b].
2i.e., by finding the translated predicate Qˆ satisfying the formula
47
2 Model for Concurrent Systems and Simulation
Theorem 2.6 (Simulated Safety Property Transfer). Given are the simulation framework RSeS′c and
all the corresponding predicates including the abstract machine safety properties PSa and P ′Sa such that
the Cosmos model simulation theorem holds for start configurations D ∈ CS′c , E ∈ CSa , and the
simulation parameter par ∈ P . If the simulation hypothesis are fulfilled and the property P˜Sa (given
in Definition 2.47 and verified for all complete consistency block machine computations starting in E)
translates into the incompletely simulated Cosmos machine property Q[P˜Sa , par ]3, then any suitable
Cosmos machine schedule leaving D is safe wrt. the ownership, Q[P˜Sa , par ] holds for all reachable
configurations, and all implementing computations are well-behaved.
Using the shorthand
PSc(d) ≡W (d) ∧Q[P˜Sa , par ](d)
with d ∈ CS′c , we formally state the theorem as
∀D,E, P ′Sa , par . simh(D,E, P ′Sa , par) =⇒ safety(D,PSc , suit)
Moreover, it is also practical to know that the simulated properties hold completely for com-
plete consistency block machine computations.
Corollary 2.5 (Complete Simulated Property Transfer). For complete consistency block machine
computations on the concrete level, Theorem 2.6 allows to transfer the property P˜Sa into the com-
pletely simulated property Qˆ[P˜Sa , par ], suit) if it can be derived. Formally, for PˆSc(d) ≡ W (d) ∧
Qˆ[P˜Sa , par ](d) with d ∈ CS′c , we state
∀D,E, P ′Sa , par . simh(D,E, P ′Sa , par) =⇒ safetycB (D, PˆSc , suit)
3with the properties of Qˆ stated above
48
3 Formal Model of MIPS-86
In the thesis we consider a multi-core MIPS-86 model defined by Sabine Schmaltz in her doc-
toral dissertation [Sch13]. The model is based on a sequential MIPS instruction set architec-
ture (ISA) from [KMP14] and extended with components typical for x86-64 architectures: store
buffers (SB), memory management units (MMU) with translation lookaside buffers (TLB), de-
vices, local and I/O advanced programmable interrupt controllers (APIC) serving for the inter-
processor interrupts (IPI) and device interrupts delivery.
Since in the scope of the thesis we are not interested in the communication between the pro-
cessors via IPIs as well as device accesses, we stick to a simplified version containing only two of
the aforementioned components and supporting the external interrupts. Using the definitions
from [Sch13], we first introduce the single core MIPS-86 model, and then provide the semantics
for the corresponding multi-core machine. We also adapt the original definitions where it is
necessary to match our concurrent simulation theory from the previous chapter. Moreover, we
extend the ISA with an instruction from [Kov13] which exists in x86-64 architectures and allows
a programmer easier to guarantee the sequential consistent memory in the presence of store
buffers.
3.1 Instruction Set Architecture Overview
An overview of the multi-core MIPS-86 model used in this thesis is depicted on Figure 3.1.
The abstraction of the sequential processor cores features atomic fetch-and-execute transitions
which can be justified only in the absence of self-modifying code. Namely, if an instruction
being fetched from the memory by a core cannot be changed by other processors present in the
model, then we can reorder the fetch cycle in a way that it appears exactly before the instruction
execution.
The processor core communicates with its TLB performing and caching address translations
used by the processor to establish a desired virtual memory abstraction. The SB caches memory
write requests of the core and provides data in case of read requests if possible. Both compo-
nents can also perform their steps independently from the core. For instance, the store buffer
can commit its pending stores to the memory. Note, that the memory depicted on Figure 3.1
and considered in our multi-core model is already an abstraction of a cache shared memory
system implemented and proven to be sequential consistent in [KMP14].
As suggested by Schmaltz we provide the MIPS-86 model with an order of processor com-
ponent steps unknown to the programmer. Such a non-deterministic behavior is modeled by
a deterministic automaton taking among its inputs some information specifying which com-
ponent performs a particular transition. We will use this approach not only for a single core
model semantics, but also for the multi-core machine where the order of processor executions
is determined by their indices.
If a few components of the model perform synchronous steps as a response to an action of a
certain component, we call these steps passive in comparison to the active step triggering them.
49
3 Formal Model of MIPS-86
TLB SB
Core
Processor
Memory
TLB SB
Core
Processor
TLB SB
Core
Processor
...
External InterruptsExternal Interrupts External Interrupts
Figure 3.1: Overview of MIPS-86 Components.
immrtrsopc
01626 21
0
opc
26
rtrs
21 16
rd sa
11
26
opc
I
R
J
0
iindex
6
31
31
31
25 20
20
15
15 1025 5
25
fun
Figure 3.2: Types and Fields of MIPS instructions
A typical passive component of a step is the memory changing its configuration as a result of
write accesses from the store buffer.
3.1.1 Instruction Set
The MIPS-86 ISA provides instructions of I-, J , and R-types. The I-type instructions operate
with two registers of the core and a so called immediate constant whereas the R-type allows to
make operations involving three registers. The J-type instructions are used for absolute jumps
in the assembly code. Any instruction is represented as a 32-bit binary word and has its own
layout depending on the type (see Figure 3.2). The fields rs, rt, and rd stand for the source,
target, and destination registers respectively. Another field sa is a shift amount used for shift
operations.
An overview of available instructions is given in Tables 3.1– 3.3. Note, that the exact seman-
tics of the instruction execution will be considered later when we define the transition func-
tion of the single core MIPS-86. In the tables you find only approximate effects. For example,
m denotes the full memory system including also the store buffers, and the table description
does not show which component is involved into a word memory access. Moreover, the in-
structions which mnemonic ends with u perform unsigned arithmetic with binary numbers.
A particular case that is worth paying attention to is the instruction sltiu interpreting the sign
extended immediate constant as a binary number though in case of imm[15] = 1 one obviously
has 〈sxt(imm)〉 6= 〈imm〉.
Most of the operations supported in the MIPS-86 ISA deal with the registers from the general
50
3.1 Instruction Set Architecture Overview
opc rt Mnemonic Assembler-Syntax Effect
Data Transfer
100 011 lw lw rt rs imm rt = m(rs + sxt(imm))
101 011 sw sw rt rs imm m(rs + sxt(imm)) = rt
Arithmetic, Logical Operation, Test-and-Set
001 000 addi addi rt rs imm rt = rs + sxt(imm)
001 001 addiu addiu rt rs imm rt = rs + sxt(imm)
001 010 slti slti rt rs imm rt = (rs < sxt(imm) ? 132 : 032)
001 011 sltui sltui rt rs imm rt = (rs < sxt(imm) ? 132 : 032)
001 100 andi andi rt rs imm rt = rs ∧ zxt(imm)
001 101 ori ori rt rs imm rt = rs ∨ zxt(imm)
001 110 xori xori rt rs imm rt = rs ⊕ zxt(imm)
001 111 lui lui rt imm rt = imm016
Branch
000 001 00000 bltz bltz rs imm pc = pc + (rs < 0 ? imm00 : 432)
000 001 00001 bgez bgez rs imm pc = pc + (rs ≥ 0 ? imm00 : 432)
000 100 beq beq rs rt imm pc = pc + (rs = rt ? imm00 : 432)
000 101 bne bne rs rt imm pc = pc + (rs 6= rt ? imm00 : 432)
000 110 00000 blez blez rs imm pc = pc + (rs ≤ 0 ? imm00 : 432)
000 111 00000 bgtz bgtz rs imm pc = pc + (rs > 0 ? imm00 : 432)
Table 3.1: I-Type MIPS-86 Instructions
opc Mnemonic Assembler-Syntax Effect
Jumps
000 010 j j iindex pc = bin32(pc+432)[31:28]iindex00
000 011 jal jal iindex R31 = pc + 432,
pc = bin32(pc+432)[31:28]iindex00
Table 3.2: J-Type MIPS-86 Instructions
purpose register file (GPR). However, the coprocessor instructions serve for returning from an
interrupt handler to the interrupted program and for moving data between GPRs and special
purpose registers (SPR) important in the context of system programming. The SPRs available
in the considered model are shortly described in Table 3.4. Note, that in contrast to [Sch13]
we introduce a register containing a processor identifier. The reason is that the local APIC
containing such information is not present in our model. We assume that the processor ID is
hard-wired into the corresponding register and writes to this register have no effect.
3.1.2 Store Buffers
The simplest form of the store buffer we deal with in the MIPS-86 model is a first-in-first-out
queue of pending memory write accesses to physical addresses. Any memory read access gen-
erated by the processor can than be serviced in the store buffer by finding the most recent SB
entry for a given address. If such an entry is not present, the data is read from the memory.
Using the store buffers in the hardware implementation allows to perform memory accesses
51
3 Formal Model of MIPS-86
opcode fun rs Mnemonic Assembler-Syntax Effect
Shift Operation
000000 000 000 sll sll rd rt sa rd = sll(rt,sa)
000000 000 010 srl srl rd rt sa rd = srl(rt,sa)
000000 000 011 sra sra rd rt sa rd = sra(rt,sa)
000000 000 100 sllv sllv rd rt rs rd = sll(rt,rs)
000000 000 110 srlv srlv rd rt rs rd = srl(rt,rs)
000000 000 111 srav srav rd rt rs rd = sra(rt,rs)
Arithmetic, Logical Operation
000000 100 000 add add rd rs rt rd = rs + rt
000000 100 001 addu addu rd rs rt rd = rs + rt
000000 100 010 sub sub rd rs rt rd = rs − rt
000000 100 011 subu subu rd rs rt rd = rs − rt
000000 100 100 and and rd rs rt rd = rs ∧ rt
000000 100 101 or or rd rs rt rd = rs ∨ rt
000000 100 110 xor xor rd rs rt rd = rs ⊕ rt
000000 100 111 nor nor rd rs rt rd = rs ∨ rt
Test-and-Set Operation
000000 101 010 slt slt rd rs rt rd = (rs < rt ? 1 : 0)
000000 101 011 sltu sltu rd rs rt rd = (rs < rt ? 1 : 0)
Jumps, System Call
000000 001 000 jr jr rs pc = rs
000000 001 001 jalr jalr rd rs rd = pc + 4 pc = rs
000000 001 100 sysc sysc System Call
Synchronizing Memory Operations
000000 111 111 cas cas rd rs rt rd’ = m(rs)
m’(rs) = (rd = m(rs) ? rt : m(rs))
flushes the SB
000000 111 110 mfence mfence flushes the SB
000000 111 101 locksw locksw rs rt m(rs) = rt
flushes the SB
TLB Instructions
000000 111 011 flush flush flushes TLB
000000 111 010 invlpg invlpg rd rs flushes TLB translations
for addr. rd from ASID rs
Coprocessor Instructions
opcode fun rs Mnemonic Assembler-Syntax Effect
010000 011 000 10000 eret eret Exception Return
010000 00100 movg2s movg2s rd rt spr(rd)= gpr(rt)
010000 00000 movs2g movs2g rd rt gpr(rt) = spr(rd)
Table 3.3: R-Type MIPS-86 Instructions
52
3.1 Instruction Set Architecture Overview
index i shorthand for i and i5 description
0 sr status register (masks for interrupts)
1 esr exception status register
2 eca exception cause register
3 epc exception pc (return address after interrupt handling)
4 edata exception data (effective address on page fault)
5 pto page table origin
6 asid address space identifier
7 mode mode register ∈ {0311, 032}
8 emode exception mode register ∈ {0311, 032}
9 pid processor identifier
Table 3.4: MIPS-86 Special Purpose Registers.
faster while the memory system is still busy with another operation. However, since there is
no communication between the store buffers of processors in a multi-core system, the memory
visible for the programmer cannot be treated as sequentially consistent as long as no specific
programming discipline is applied.
In order to allow the programmer to obtain the sequentially consistent view of accessed mem-
ory in the presence of the store buffers, hardware developers provide serializing processor in-
structions having an explicit flushing effect on the SB. In our MIPS-86 model we offer three of
them. The memory fence instruction mfence is simply used for draining the SB. The compare-
and-swap cas performs an atomic conditional memory update whereas the locked write locksw
borrowed from [Kov13] acts as the latter one but without a condition on the memory write.
3.1.3 Address Translation
The memory management unit of the MIPS-86 processor being in user mode performs a 2-level
translation from 32-bit virtual to physical addresses at the granularity of memory pages consisting
of 212 consecutive bytes. Therefore, a page can be addressed with 20 bits and we treat the
corresponding most significant bits of a virtual address as a virtual page address.
The first-level page table or a so called page directory is accessed at the page table origin taken
from the SPR register pto. This table is used for translating the first 10 bits of the page address to
the memory location at which a page table of the second level (or terminal page table) resides in
the memory. Then a terminal page table entry with an index represented by the second 10 bits of
the page address provides a physical page address. Note that entries on any level can be marked
as not present. This means that a corresponding translation is not set up or a memory page is
not in the memory. An attempt to access such a page table entry generates a page fault interrupt
returning the processor core to system mode where an operating system kernel or a hypervisor
is supposed to resolve the issue.
The result of traversing the graph of the page tables for the address translation is cached in
the tagged TLB in order to serve further processor translation requests in a faster way. For such
partial or complete address translations in our model we use a term walks. The tag used in the
TLB and walks is an address space identifier (ASID) associating translations with particular users.
Since operating systems and hypervisors may modify page tables, the translations present in
the TLB could become out of date. In order to allow consistent page table updates in the MIPS-
86 ISA we provide two instructions typical for x86-64 architectures. The flush operation just
empties the TLB. The instruction invlpg, however, removes all walks for a certain virtual page
53
3 Formal Model of MIPS-86
interrupt level shorthand source type maskable description
0 reset external abort no reset
1 I/O internal repeat yes devices
2 ill internal abort no illegal instruction
3 mal internal abort no misaligned
4 pff internal repeat no page fault on fetch
5 pfls internal repeat no page fault on load/store
6 sysc internal continue no system call
7 ovf internal continue yes overflow
Table 3.5: MIPS-86 Interrupt Types and Priority.
address and a given ASID. Therefore, using the tagged TLB substantially reduces the number of
flushes during a switch between user processes or guests by operating systems and hypervisors
respectively.
The specification of the address translation and the TLB used in our MIPS-86 is presented in
detail later in Section 3.2.3.
3.1.4 Interrupts
The MIPS-86 model considered in the thesis supports an interrupt mechanism for internal and ex-
ternal interrupts listed in Table 3.5. The former are triggered in the processor during instruction
fetch and execution causing a specific event, e.g., an overflow in an arithmetic operation, a sys-
tem call used to transfer the control from a user process to an operating system for executing a
specific task, a page fault occurring when a required address translation is missing, etc. The lat-
ter have an external source and are triggered by devices or the reset signal. When an interrupt
signal is raised and not masked, the processor switches to system mode and performs a so called
jump to interrupt service routine (JISR) where a corresponding interrupt handler is supposed to
be called.
For resuming the execution after interrupt handling the MIPS-86 ISA provides the instruction
eret. Depending on the type of the interrupt, the execution of the interrupted instruction can
be repeated or aborted. Moreover, in case of the type continue it is completed before JIRS and we
proceed with the next instruction after returning from the interrupt.
The exact semantics of the interrupt mechanism in the MIPS-86 model is given later in this
chapter.
3.2 Single Core MIPS-86 ISA
Now, we define the single core MIPS-86 ISA in a top-down manner. We introduce the overall
machine configuration, and then describe each component separately in detail. At the end we
combine all definitions in the transition function of the MIPS-86 machine.
3.2.1 Configuration Overview
Definition 3.1 (Processor Configuration of MIPS-86). A processor configuration
Cproc
def≡ (core ∈ Ccore, sb ∈ Csb, tlb ∈ Ctlb)
54
3.2 Single Core MIPS-86 ISA
consists of the processor core core executing instructions according to the MIPS-86 ISA, the store
buffer sb, and the translation lookaside buffer tlb.
Definition 3.2 (Memory Configuration of MIPS-86). In the model we consider a simple byte-
addressable global memory of the type
Cm
def≡ B32 → B8
Definition 3.3 (Configuration of the Single Core MIPS-86). A configuration
CMIPS
def≡ (cpu ∈ Cproc,m ∈ Cm)
of the single core MIPS-86 ISA is represented by the processor configuration cpu and the global
memory component m.
3.2.2 Memory and Store Buffer
For the MIPS ISA memory we define effects of read, write, and compare-and-swap operations.
The two latter operations are expressed via a memory transition function executed in response
to the active steps of the processor core, store buffer, or TLB as we will see in the MIPS single
core transition function.
Definition 3.4 (Reading Bit-Strings from Byte Addressable Memory). For a memorym ∈ Cm,
an address a ∈ B32, and a number d ∈ N of bytes, we define
md(a)
def≡
{
m(a) : d = 1
md−1(a+32 132) ◦m(a) : otherwise
Since our ISA supports for simplicity only word accesses to the memory, we will use d = 4.
Definition 3.5 (Memory Transition Function). We define the memory transition function
δm : Cm × Σm → Cm
with the input alphabet
Σm
def≡ (B32 × B32) ∪ (B32 × B32 × B32)
such that
• (a, v) ∈ B32 × B32 – describes a write access to address a with value v, and
• (c, a, v) ∈ B32 × B32 × B32 – describes a compare-and-swap access to address a with
compare-value c and value v to be written in case of success.
δm(m, in)(x)
def≡

byte(〈x〉 − 〈a〉, v) : in = (a, v) ∧ 0 ≤ 〈x〉 − 〈a〉 < 4
byte(〈x〉 − 〈a〉, v) : in = (c, a, v) ∧m4(a) = c ∧ 0 ≤ 〈x〉 − 〈a〉 < 4
m(x) : otherwise
In turn, the store buffer is modelled as a sequence of entries representing writes of words by
the core.
55
3 Formal Model of MIPS-86
Definition 3.6 (Store Buffer Configuration). The set of store buffer entries is given by the type
Csbe
def≡ (a ∈ B30, v ∈ B32)
where a is a word address and v is a value to be written to the memory at the address a ◦ 00.
Therefore, the set of store buffer configurations can be defined as follows:
Csb
def≡ C∗sbe
In the MIPS-86 model a non-empty store buffer can make an active step independently from
the processor core. During such a step the SB commits a pending memory write specified by
the oldest entry present in its configuration.
We do not introduce an individual SB transition function and will formalize the effect of its
active and passive steps in the transition of the single core MIPS-86. Here, we provide some
auxiliary definitions needed for that.
Definition 3.7 (Store Buffer Hit Predicate). For a given store buffer configuration sb ∈ Csb and
a memory word address x ∈ B30, the predicate
sbhit(sb, x)
def≡ ∃j ∈ |sb| . sb[j].a = x
indicates a store buffer hit, meaning that a memory word write at this address is present in sb.
Definition 3.8 (Newest Store Buffer Hit Index). Given a store buffer configuration sb ∈ Csb
and a word address x ∈ B30, we define a partial function
maxsbhit(sb, x)
def≡ max{j | sbhit(sb[j], x)}
which computes the index of the newest SB entry among those for which the hit holds.
Definition 3.9 (Store Buffer Byte Value). For a store buffer configuration sb ∈ Csb and a byte
address x ∈ B32 we define a partial function sbv(sb, x) that computes a byte value forwarded
from the store buffer for the given address in case of the hit.
Let xw ≡ x[31 : 2], xb ≡ x[1 : 0], and k ≡ maxsbhit(sb, xw), then if sbhit(sb, xw) holds, we can
compute
sbv(sb, x)
def≡ byte(〈xb〉, sb[k].v)
The results of read accesses performed by the processor core can then be described in terms
of a memory system including the store buffer and the memory.
Definition 3.10 (Memory System). We define a function ms that, given these components, re-
turns the merged memory view seen by the processor core
ms(sb,m)(x)
def≡
{
sbv(sb, x) : sbhit(sb, x[31 : 2])
m(x) : otherwise
3.2.3 Translation Lookaside Buffer
For any memory access by the processor in user mode, the virtual address is translated into the
physical one on the base of the page tables residing in the memory and the TLB. In order to
perform address translation, the MMU operates on the page tables to create, extend, complete,
and drop walks.
56
3.2 Single Core MIPS-86 ISA
Definition 3.11 (TLB Configuration of MIPS-86). We define the set of configurations of the
TLB as
Ctlb
def≡ 2Cwalk
where the set of walks Cwalk is given by
Cwalk
def≡ (vpa ∈ B20, asid ∈ B6, level ∈ {0, 1, 2}, pa ∈ B20, r ∈ B3, fault ∈ B)
with the following components:
• vpa – the virtual page address to be translated,
• asid – the address space identifier the translation belongs to,
• level – the current level of the walk, i.e. the number of remaining walk extensions needed
to complete the walk,
• pa – the physical page address of the page table to be accessed next, or, if the walk is
complete, the result of the translation,
• r – accumulated access rights, such that r[0] stands for write permission, r[1] for user
mode access, and r[2] expresses execute permission,
• fault – a flag indicating a page fault.
Now, we describe the structure of page tables and define the addresses translation for given
virtual address, page table origin, and memory configuration.
Definition 3.12 (Page and Byte Index). We split a given virtual address va ∈ B32 in the follow-
ing way
va
def≡ va.px2 ◦ va.px1 ◦ va.px0
and define
• the second-level page index va.px2 def≡ va[31 : 22],
• the first-level page index va.px.1 def≡ va[21 : 12],
• the byte offset va.px0 def≡ va[11 : 0] of va.
Note, that the concatenation va.vpa ≡ a.px2 ◦ a.px1 constitutes the virtual page address to be
translated.
Definition 3.13 (Page Table Entry). A page table entry pte ∈ B32 consists of
• pte.pa def≡ pte[31 : 12] – the physical page address of the next page table or, if the page
table is the terminal one, the resulting physical page address for a translation,
• pte.p def≡ pte[11] – the present bit,
• pte.r def≡ pte[10 : 8] – the access rights for pages accessed via a translation that involves
the page table entry: pte.r[0] indicates the write permission, pte.r[1] is the permission for
user accesses, and pte.r[2] allows to fetch and execute instructions,
57
3 Formal Model of MIPS-86
• pte.a def≡ pte[7] – the accessed flag indicating whether the MMU has already used the
page table entry for a translation.
Definition 3.14 (Page Table Entry Address). For a physical page address pa ∈ B20 of a page
table and a page index i ∈ B10, we define the corresponding page table entry address as
ptea(pa, i)
def≡ pa ◦ 012 +32 020i00
The page table entry address needed to extend a given walk w ∈ Cwalk is then defined as
ptea(w)
def≡ ptea(w.pa, (w.vpa ◦ 012).pxw.level)
Definition 3.15 (Page Table Entry for a Walk). Given a memorym ∈ Cm and a walk w ∈ Cwalk,
we define the page table entry needed to extend this walk as
pte(m,w) = m4(ptea(w))
Definition 3.16 (Walk Creation). We define the function
winit : B20 × B32 × B6 → Cwalk
which, given a virtual page address vpa ∈ B20, a page table origin pto ∈ B20, and an address
space identifier asid ∈ B6, returns the initial walk for the translation of va.
winit(vpa, pto, asid)
def≡ w
is given by
w.vpa = vpa w.level = 2 w.r = 111
w.asid = asid w.pa = pto[31 : 12] w.fault = 0
Note that in our specification of the MMU, the initial walk always has full rights (w.r = 111).
However, in every translation step, the rights associated with the walk can be restricted as
needed by the translation request made by the processor core.
Definition 3.17 (Sufficient Access Rights). For a pair of access rights r, r′ ∈ B3, we use
r ≤ r′ def≡ ∀j ∈ [0 : 2]. r[j] ≤ r′[j]
to describe that the access rights r are weaker than r′, i.e. rights r′ are sufficient to perform an
access with rights r.
Definition 3.18 (Walk Extension). We define the function
wext : Cwalk × B32 × B3 → Cwalk
which extends a given walk w ∈ Cwalk using a page table entry pte ∈ B32 and access rights
r ∈ B3 in such a way that
wext(w, pte, r)
def≡ w′
is given by
w′.vpa = w.vpa
58
3.2 Single Core MIPS-86 ISA
w′.asid = w.asid
w′.level =
{
w.level − 1 : pte.p
w.level : otherwise
w′.pa =
{
pte.pa : pte.p
w.pa : otherwise
w′.r =
{
w.r ∧ pte.r : pte.p
w.r : otherwise
w′.fault = ¬pte.p ∨ ¬r ≤ (w.r ∧ pte.r)
Definition 3.19 (Complete Walk). A walk w ∈ Cwalk with w.level = 0 is called a complete walk
complete(w)
def≡ w.level = 0
Definition 3.20 (Setting Accessed Flag of a Page Table Entry). Given a page table entry pte ∈
B32, we define the function
seta(pte)
def≡ pte[a := 1]
which returns an updated page table entry in which the accessed bit is set.
Definition 3.21 (Translation Request). A translation request from the processor core is defined
as a triple
TRq
def≡ (asid ∈ B6, va ∈ B32, r ∈ B3)
consisting of the address space identifier asid, virtual address va, and access rights r depending
on operation performed by the core.
Definition 3.22 (TLB Hit). When a walk w ∈ Cwalk matches a translation request trq ∈ TRq in
terms of virtual page address, address space identifier and access rights, we call this a TLB hit:
hit(trq, w)
def≡ (i) w.vpa = trq.va.vpa
(ii) w.asid = trq.asid
(iii) trq.r ≤ w.r
Note, that a hit may be to an incomplete walk.
Definition 3.23 (Page-Faulting Walk Extension). A page fault for a given translation request
trq ∈ TRq can occur for a given walk w ∈ Cwalk when its walk extension using a page table
entry residing in memory m would result in a fault.
fault(m, trq, w)
def≡ (i) hit(trq, w)
(ii) ¬complete(w)
(iii) wext(w, pte(m,w), trq.r).fault
Though page faults may occur at any translation level, according to [Sch13] the TLB in the
MIPS-86 model keeps only non-faulting walks.
59
3 Formal Model of MIPS-86
Definition 3.24 (Physical Memory Address – Translation result). For a complete walk w ∈
Cwalk and a virtual address va ∈ B32 we define the result of the translation - the physical
memory address as
pma(w, va)
def≡ w.pa ◦ va.px0
How exactly the page faults are triggered and how the MMU operations affect the single
core MIPS-86 machine configuration will be formalized in the top-level transition function in
Section 3.2.5. For now, we define the TLB transition function only for basic operations on its
configuration.
Definition 3.25 (Transition Function of the TLB). The transition function of the TLB
δtlb : Ctlb × Σtlb → Ctlb
is defined for the input alphabet
Σtlb
def≡ {flush} × B6 × B20 ∪ {flush-incomplete} ∪ {add-walk} × Cwalk
such that, depending on a given inputs, the TLB performs the following operations:
• flushing a virtual page address for a given address space identifier
δtlb(tlb, (flush, asid, vpa))
def≡ {w ∈ tlb | ¬(w.asid = asid ∧ w.vpa = vpa)}
• flushing all incomplete walks from the TLB
δtlb(tlb,flush-incomplete)
def≡ {w ∈ tlb | complete(w)}
• adding a walk
δtlb(tlb, (add-walk, w))
def≡ tlb ∪ {w}
3.2.4 Processor Core
3.2.4.1 Configuration and Transition Overview
Definition 3.26 (Processor Core Configuration). A MIPS-86 processor core configuration
Ccore
def≡ (pc ∈ B32, gpr : B5 → B32, spr : B5 → B32)
consist of the program counter pc, the general purpose register file gpr, and the special purpose
register file spr.
Definition 3.27 (Current Address Space Identifier and Mode). The address space identifier in
the core configuration c ∈ Ccore is given by the first 6 bits of the special purpose register asid:
asid(c)
def≡ c.spr(asid)[5 : 0]
The mode and exception mode of the processor core are indicated by the first bit of the corre-
sponding SPR:
mode(c)
def≡ c.spr(mode)[0] emode(c) def≡ c.spr(emode)[0]
60
3.2 Single Core MIPS-86 ISA
Definition 3.28 (Processor Core Transition Function). We define the processor core transition
function
δcore : Ccore × Σcore → Ccore
which takes a processor core input from
Σcore
def≡ Σinstr × Σeev × B× B
where
Σinstr
def≡ B32 × B32
is a set of inputs required for instruction execution, i.e. a pair of an instruction word I ∈ B32
and value R ∈ B32 read from memory in case of lw or cas instructions, and
Σeev
def≡ B256
is used to represent a vector eev ∈ B256 of external interrupt signals. Moreover, we explicitly
pass the page fault on fetch and page fault on load/store signals pff, pfls ∈ B.
Then, the processor core transition function is defined by a case split on signal jisr (jump to
the interrupt service routine) indicating whether an interrupt is triggered in the current step of
the machine and the signal eret (return from exception) which is active when the instruction I
to be executed is eret.
δcore(c, I, R, eev, pff, pfls)
def≡

δjisr(c, I, R, eev, pff, pfls) : jisr(c, I, eev, pff, pfls)
δeret(c) : ¬jisr(c, I, eev, pff, pfls)∧
eret(I)
δinstr(c, I, R) : otherwise
In the definition above, we use the auxiliary transition functions that will be considered in
detail in this section:
• execution of an non-interrupted instruction
δinstr : Ccore × Σinstr → Ccore
• computation of the next state of the core when an interrupt is triggered
δjisr : Ccore × Σcore → Ccore
• return from exception
δeret : Ccore → Ccore
3.2.4.2 Instruction Execution
First, we introduce auxiliary definitions that will allow us to consider the semantics of the in-
struction execution by the processor core.
61
3 Formal Model of MIPS-86
Instruction Decoding
Definition 3.29 (Fields of the Instruction Layout). We define formally the types and fields of
instructions according to the MIPS-86 instruction set and the instruction word layout given in
Section 3.1.1:
• instruction opcode
opc(I)
def≡ I[31 : 26]
• instruction type
rtype(I)
def≡ opc(I) = 06 ∨ opc(I) = 0104
jtype(I)
def≡ opc(I) = 0410 ∨ opc(I) = 0411
itype(I)
def≡ ¬(rtype(I) ∨ jtype(I))
• register addresses
rs(I)
def≡ I[25 : 21] rt(I) def≡ I[20 : 16] rd(I) def≡ I[15 : 11]
• shift amount
sa(I)
def≡ I[10 : 6]
• function code (used only for R-type instructions)
fun(I)
def≡ I[5 : 0]
• immediate constant and instruction index (for I-type and J-type instructions, respectively)
imm(I)
def≡ I[15 : 0] iindex(I) def≡ I[25 : 0]
Definition 3.30 (Instruction Decoding Predicates). For every mnemonic mn of a MIPS instruc-
tion I (see MIPS ISA-tables at the beginning of the chapter) we define a predicate mn(I) which
is true, if I is an mn instruction. Formally, the predicates check for the corresponding opcode
and function code. E.g.
lw(I) ≡ opc(I) = 100011
add(I) ≡ rtype(I) ∧ fun(I) = 100000
The remaining predicates associated with the mnemonics are derived in the same obvious way.
Definition 3.31 (Undefined ISA Instruction). By inspection of the tables we define a predicate
undef(I) which is true if an instruction is not defined in the considered MIPS-86 ISA. This pred-
icate is basically a negation of the conjunction of all defined instruction decoding predicates.
Definition 3.32 (Illegal Instruction). We consider an instruction I to be illegal for the core in
configuration c ∈ Ccore if this instruction does not belong to the instruction set or in user mode
there is an attempt to execute one of the coprocessor instructions as well as to flush the TLB:
ill(c, I)
def≡ undef(I)∨
mode(c) ∧ (movg2s(I) ∨movs2g(I) ∨ eret(I) ∨ flush(I) ∨ invlpg(I))
62
3.2 Single Core MIPS-86 ISA
alucon[3:0] i alures ovf
0 000 * a+32 b 0
0 001 * a+32 b [a] + [b] /∈ T32
0 010 * a−32 b 0
0 011 * a−32 b [a]− [b] /∈ T32
0 100 * a ∧32 b 0
0 101 * a ∨32 b 0
0 110 * a⊕32 b 0
0 111 0 ¬32(a ∨32 b) 0
0 111 1 b[15 : 0]016 0
1 010 * 031([a] < [b]?1 : 0) 0
1 011 * 031(〈a〉 < 〈b〉?1 : 0) 0
Table 3.6: Specification of ALU Operations
Operations on the translation lookaside buffer in user mode are forbidden for user processes
running under an operating system because the OS is responsible for virtualization of user
address spaces and the address translation is not visible from the process point of view. In
case of a hypervisor, however, a guest operating system running in user mode on the host
machine must be able to perform operations on the TLB. However, since the MIPS-86 model
does not support hardware virtualization, the hypervisor has to emulate such guest operations
by implementing, e.g., the shadow page table mechanism [Kov13].
ALU Operations
The result of arithmetic, logical, and test-and-set operations is computed by the arithmetic
logic unit (ALU) of the MIPS-86 model.
Definition 3.33 (ALU Specification). For two operands a, b ∈ B32, control bits alucon ∈ B4
and a flag i ∈ B, such that alucon and i represent an ALU operation, we define the result
alures(a, b, alucon, i) ∈ B32 of the operation and the overflow flag ovf(a, b, alucon) ∈ B accord-
ing to Table 3.6.
Definition 3.34 (ALU Instruction Predicates). To describe whether a given instruction I ∈ B32
performs an ALU operation, we define the following predicates:
• I-type ALU instruction: alui(I) def≡ itype(I) ∧ I[31 : 29] = 001
• R-type ALU instruction: alur(I) def≡ rtype(I) ∧ I[5 : 4] = 10
• any ALU instruction: alu(I) def≡ alui(I) ∨ alur(I)
Definition 3.35 (ALU Operands of Instruction). Using the ISA tables, we specify the right and
left operands of an ALU instruction I ∈ B32 for a given processor core configuration c ∈ Ccore
as follows:
• left ALU operand: lop(c, I) def≡ c.gpr(rs(I))
63
3 Formal Model of MIPS-86
• right ALU operand: rop(c, I) def≡

c.gpr(rt(I)) : rtype(I)
sxt32(imm(I)) : ¬rtype(I) ∧ ¬I[28]
zxt32(imm(I)) : otherwise
Definition 3.36 (ALU Control Bits of Instruction). We define the ALU control bits of an in-
struction I ∈ B32 as
alucon(I)[2 : 0]
def≡
{
I[2 : 0] : rtype(I)
I[28 : 26] : otherwise
alucon(I)[3]
def≡ rtype(I) ∧ I[3] ∨ ¬I[28] ∧ I[27]
Definition 3.37 (ALU Computation Result). Therefore, the ALU result of an arithmetic, logi-
cal, or test-and-set instruction I executed by the processor core in a configuration c ∈ Ccore is
defined as
compres(c, I)
def≡ alures(lop(c, I), rop(c, I), alucon(I), itype(I))
Jump and Branch Instructions
The jump and branch instruction of the MIPS-86 ISA influence the program control flow by
changing the program counter of the processor core to a new address value. In comparison
to the ordinary jumps, the branch instructions allow to set the program counter only when a
certain condition on a value of a general purpose register holds.
Definition 3.38 (Branch Condition Evaluation). Depending on a comparison operation deter-
mined by control bits bcon ∈ B4, we define the function bcres(a, b, bcon) ∈ B for the brunch
condition evaluation comparing a parameter a ∈ B32 with zero or with a second parameter
b ∈ B32 in the following way:
bcres(a, b, bcon)
def≡

[a] < 0 : bcon = 0010
[a] ≥ 0 : bcon = 0011
a = b : bcon[3 : 1] = 100
a 6= b : bcon[3 : 1] = 101
[a] ≤ 0 : bcon[3 : 1] = 110
[a] > 0 : bcon[3 : 1] = 111
undefined : otherwise
Definition 3.39 (Branch Instruction Predicates). The following branch instruction predicates
denote whether a given instruction I ∈ B32 is a jump or successful branch instruction in a given
core configuration c ∈ Ccore:
• branch instruction: b(I) def≡ opc(I)[5 : 3] = 03 ∧ itype(I)
• jump instruction: jump(I) def≡ j(I) ∨ jal(I) ∨ jr(I) ∨ jalr(I)
• jump or branch taken:
jbtaken(c, I)
def≡ jump(I) ∨ b(I) ∧ bcres(c.gpr(rs(I)), c.gpr(rt(I)), opc(I)[2 : 0]rt(I)[0])
64
3.2 Single Core MIPS-86 ISA
Definition 3.40 (Branch Target). Then the target address of a jump or successful branch instruc-
tion I ∈ B32 in a given core configuration c ∈ Ccore is computed as
btarget(c, I)
def≡

c.pc+32 sxt30(imm(I))00 : b(I)
c.gpr(rs(I)) : jr(I) ∨ jalr(I)
(c.pc+32 432)[31 : 28]iindex(c)00 : j(I) ∨ jal(I)
Shift Operations
The result of a shift instruction is computed by a shift unit performing operations on values
from the general purpose registers.
Definition 3.41 (Shift Operations). For bit-string a ∈ Bn and a shift distance i ∈ {0, . . . , n− 1}
we define the following shift operations available in MIPS-86 ISA:
• shift left logical: sll(a, i) def≡ a[n− i− 1 : 0]0i
• shift right logical: srl(a, i) def≡ 0ia[n− 1 : i]
• shift right arithmetic: sra(a, i) def≡ ain−1a[n− 1 : i]
Note that for MIPS-86 we will use the aforementioned definitions only for n = 32.
Definition 3.42 (Shift Unit Specification). For inputs a ∈ Bn, i ∈ {0, . . . , n − 1}, and a shift
operation represented by bits sf ∈ B2 (shift function), we define the result of the operation as
follows:
sures(a, i, sf)
def≡

sll(a, i) : sf = 00
srl(a, i) : sf = 10
sra(a, i) : sf = 11
undefined : otherwise
Definition 3.43 (Shift Instruction Predicate). The predicate su(I) denotes whether an instruc-
tion I ∈ B32 is a shift instruction
su(I)
def≡ sll(I) ∨ srl(I) ∨ sra(I) ∨ sllv(I) ∨ srlv(I) ∨ srav(I)
Definition 3.44 (Shift Operands). Given a shift instruction I ∈ B32 and a processor core con-
figuration c ∈ Ccore, we define the following shift operands:
• shift distance: sdist(c, I) def≡
{
〈sa(I)〉 : fun(I)[3] = 0
〈c.gpr(rs(I))[4 : 0]〉 : fun(I)[3] = 1
• shift left operand: slop(c, I) def≡ c.gpr(rt(I))
Definition 3.45 (Shift Function). The shift function of a shift instruction I ∈ B32 is given by
sf(I)
def≡ I[1 : 0]
Definition 3.46 (Shift Unit Computation Result). The result of a shift instruction I ∈ B32
computed by the shift unit of the processor core in a configuration c ∈ Ccore is defined as
shiftres(c, I)
def≡ sures(slop(c, I), sdist(c, I), sf(I))
65
3 Formal Model of MIPS-86
Memory Access
In order to define how values are read/written from/to the memory system in the single core
MIPS-86 transition function we provide a few auxiliary definitions.
Definition 3.47 (Memory Instruction Predicates). For an instruction I ∈ B32 we define the
following predicates:
• memory load instruction: load(I) def≡ lw(I) ∨ cas(I)
• memory store instruction: store(I) def≡ sw(I) ∨ locksw(I) ∨ cas(I)
• memory instruction: mem(I) def≡ load(I) ∨ store(I)
Definition 3.48 (Effective Address). The effective address ea(c, I) at which the memory system
is accessed by a memory instruction I ∈ B32 executed by the processor core in configuration
c ∈ Ccore is computed as
ea(c, I)
def≡
{
c.gpr(rs(I)) +32 sxt32(imm(I)) : itype(I)
c.gpr(rs(I)) : rtype(I)
A byte address of a memory access must be correctly aligned, i.e., divisible by the width of
the access, what is simply a word (4 bytes) for instruction fetch and data load/store in the
considered MIPS-86 ISA. Otherwise, the corresponding interrupt is triggered in the processor
core.
Definition 3.49 (Data and Instruction Misalignment Predicates). For an instruction I ∈ B32
and a processor core configuration c ∈ Ccore, we define the predicate
dmal(c, I)
def≡ mem(I) ∧ ea(c, I)[1 : 0] 6= 00
indicating whether the data memory access is misaligned. Analogously, the predicate
imal(c)
def≡ c.pc[1 : 0] 6= 00
shows that the program counter for the instruction fetch is not word aligned.
Definition 3.50 (Store Value). Given a memory store instruction I ∈ B32 and a processor core
configuration c ∈ Ccore, the store value is a word taken from the general purpose register spec-
ified by rt(I)
sv(c, I)
def≡ c.gpr(rt(I))
GPRs Update
Definition 3.51 (General Purpose Register Write Predicate). The predicate
dprw(I)
def≡ alu(I) ∨ su(I) ∨ load(I) ∨ jal(I) ∨ jalr(I) ∨movs2g(I)
describes that a given instruction I ∈ B32 writes to some general purpose register.
66
3.2 Single Core MIPS-86 ISA
Definition 3.52 (General Purpose Register Result Destination). We define the result destina-
tion cad(I) of an instruction I ∈ B32 writing to a general purpose register as an address of this
register:
cad(I)
def≡

15 : jal(I)
rd(I) : rtype(I) ∧ ¬movs2g(I)
rt(I) : otherwise
Definition 3.53 (General Purpose Register Input). A value written by an instruction I ∈ B32 to
the general purpose register cad(I) in a processor core configuration c ∈ Ccore is computed as
gprdin(c, I, R)
def≡

c.pc+32 432 : jal(I) ∨ jalr(I)
R : load(I)
c.spr(rd(I)) : movs2g(I)
compres(c, I) : alu(I)
shiftres(c, I) : su(I)
where R ∈ B32 is a word read by the memory instruction from the memory system.
Instruction Execution Transition Function
Using the definitions from above we can now define the transition function for the instruction
execution by the processor core.
Definition 3.54 (Non-Interrupted Instruction Execution). For a processor core in a configu-
ration c ∈ Ccore, a legal instruction I ∈ B32, and a data word R ∈ B32 read from the memory
system when needed we define the result c′ ∈ Ccore of the non-interrupted instruction execution
as
δinstr(c, I, R)
def≡ c′
such that the processor core components are updated in the following way:
• c′.pc =
{
btarget(c, I) : jbtaken(c, I)
c.pc+32 432 : otherwise
• c′.gpr(x) =
{
gprdin(c, I, R) : x = cad(I) ∧ gprw(I)
c.gpr(x) : otherwise
• c′.spr(x) =
{
c.gpr(rt(I)) : rd(I) = x ∧movg2s(I)
c.spr(x) : otherwise
3.2.4.3 Interrupt Handling
In Table 3.5 we have already shown the interrupt ordered by their priority. Among those the
MIPS-86 model supports the external interrupts eev ∈ B256 such that eev[0] is the signal reset
and eev[255 : 1] are interrupts from devices. This kind of interrupts plays a role in the full model
with external devices as well as the local APIC providing these external events to the processor
core. Since we are not interested in these hardware components in the scope of the thesis, we
do not include them into our model for simplicity. The APIC model for MIPS-86 is described in
detail in [Sch13].
67
3 Formal Model of MIPS-86
Triggering of Interrupts
While external event signals are served as an input to the processor core transition function,
the internal event signals, as we know, are triggered in the processor.
Definition 3.55 (Internal Event Vector). Given a processor core configuration c ∈ Ccore, an
instruction I ∈ B32 to be executed if possible, and the page fault signals pff, pfls ∈ B pro-
vided by the MMU to the processor core, we define the computation of the internal even signals
iev(c, I, pff, pfls) ∈ B8 as follows:
iev(c, I, pff, pfls)[i]
def≡

¬imal(c) ∧ ¬pff ∧ ill(c, I) : i = 2
imal(c) ∨ (¬imal(c) ∧ ¬pff ∧ dmal(c, I)) : i = 3
pff ∧ ¬imal(c) : i = 4
pfls ∧ ¬imal(c) ∧ ¬pff ∧ ¬dmal(c, I) : i = 5
¬imal(c) ∧ ¬pff ∧ sysc(I) : i = 6
¬imal(c) ∧ ¬pff ∧ alu(I)∧ : i = 7
ovf(lop(c, I), rop(c, I), alucon(I))
undefined : otherwise
Note that though the page fault signals appear as external inputs in the processor core transition
function, they originate from the MMU belonging to the processor and treated as internal.
When an interrupt occurs, the information about this interrupt is saved in a special purpose
register to allow a programmer to determine the reason (or cause) of the interrupt.
Definition 3.56 (Cause and Masked Cause of Interrupts). Given a processor core configuration
c ∈ Ccore, an instruction I ∈ B32 to be executed, an external event vector eev ∈ B256, and page
fault signals pff, pfls ∈ B, we can define the cause ca ∈ B8 and masked cause mca ∈ B8 of
interrupts in the following way:
ca(c, I, eev, pff, pfls)[j]
def≡

eev[0] : j = 0∨255
i=1 eev[i] : j = 1
iev(c, I, pff, pfls)[j] : j ∈ [2 : 7]
mca(c, I, eev, pff, pfls)[j]
def≡
{
ca(c, I, eev, pff, pfls)[j] ∧ c.spr(sr)[j] : j ∈ {1, 7}
ca(c, I, eev, pff, pfls)[j] : otherwise
Definition 3.57 (Jump to Interrupt Service Routine Predicate). To denote that in a given con-
figuration c ∈ Ccore for a given instruction I ∈ B32, external event signals eev ∈ B256, and page
fault signals pff, pfls ∈ B an interrupt is triggered, we define the predicate
jisr(c, I, eev, pff, pfls)
def≡
∨
j
mca(c, I, eev, pff, pfls)[j]
Definition 3.58 (Interrupt Level of Triggered Interrupt). To determine the interrupt level of
the triggered interrupt, we define the function
il(c, I, eev, pff, pfls)
def≡ min{j | mca(c, I, eev, pff, pfls)[j] = 1}
Definition 3.59 (Continue Type Interrupt Predicate). The predicate
cont(c, I, eev, pff, pfls)
def≡ il(c, I, eev, pff, pfls) ∈ {6, 7}
shows whether the triggered interrupt is of continue type.
68
3.2 Single Core MIPS-86 ISA
Transition Functions for Exception and Return from Exception
Definition 3.60 (Interrupt Execution Transition Function). For a processor core in a configura-
tion c ∈ Ccore, an instruction I ∈ B32 to be executed, external events eev ∈ B256, and page fault
signals pff, pfls ∈ B provided by the processor’s MMU we define the computation of the next
processor core configuration c′ ∈ Ccore in case of an interrupt as
δjisr(c, I, R, eev, pff, pfls)
def≡ c′
such that
• c′.pc = 032
• c′.spr(x) =

032 : x = sr
032 : x = mode
c.spr(mode) : x = emode
c.spr(sr) : x = esr
zxt32 (mca(c, I, eev, pff, pfls)) : x = eca
c.pc : x = epc ∧ ¬cont(c, I, eev, pff, pfls)
δinstr(c, I, R).pc : x = epc ∧ cont(c, I, eev, pff, pfls)
ea(c, I) : x = edata ∧ il(c, I, eev, pff, pfls) = 5
bin32 (min{j | eev[j] = 1}) : x = edata ∧ il(c, I, eev, pff, pfls) = 1
c.spr(x) : otherwise
• c′.gpr =
{
c.gpr : ¬cont(c, I, eev, pff, pfls)
δinstr(c, I, R).gpr : otherwise
Note that in case of the page fault on fetch the program counter is saved in epc and there is no
need to use edata additionally. Moreover, in case of an interrupt of the type continue, the core
needs to finish the operation and save the computation result to the GPRs.
As we know to restore the execution after interrupt handling the MIPS-86 ISA provides the
instruction eret.
Definition 3.61 (Return From Exception Transition Function). For a processor core executing
the instruction eret in a configuration c ∈ Ccore we compute the core’s next configuration c′ ∈
Ccore as follows:
δeret(c)
def≡ c′
• c′.pc = c.spr(epc)
• c′.spr(x) =

c.spr(emode) : x = mode
c.spr(esr) : x = sr
c.spr(x) : otherwise
• c′.gpr = c.gpr
Note also that in contrast to [Sch13] our MIPS-86 model allows to return back exactly to the
mode in which an interrupt occurred.
69
3 Formal Model of MIPS-86
3.2.5 Single Core MIPS-86 ISA Transitions
Definition 3.62 (Single Core MIPS-86 Transition Function). Transitions of the single core ma-
chine are defined by the function
δMIPS : CMIPS × ΣMIPS ⇀ CMIPS
where
ΣMIPS
def≡ {core} × Cwalk × Cwalk × B256 ∪
{tlb-create} × B20 ∪
{tlb-extend} × Cwalk × B3 ∪
{tlb-accessed} × Cwalk ∪
{sb}
are processor inputs specifying which processor component makes a certain kind of a step. Active
steps can be done by the core, TLB, and SB.
Now we define the single core MIPS-86 transition function by a case distinction on a given
input in ∈ ΣMIPS
δMIPS(c, in)
def≡ c′
Any component of the configuration c′ ∈ CMIPS not listed explicitly in the following has the
same configuration as in c ∈ CMIPS.
3.2.5.1 Processor Core Step
The processor core step is indicated by the input
in = (core, wI , wD, eev)
containing the external event vector, TLB walks wI for the instruction fetch and wD for the data
access (memory or store buffer) in translated mode. These walks are ignored in case of system
mode.
For c ∈ CMIPS, core ≡ c.cpu.core, and ms(c) ≡ ms(c.cpu.sb, c.m) we introduce the following
definitions and corresponding shorthands:
• the translation request for instruction fetch in user mode
trqI ≡ trqI(core) def≡ (asid(core), core.pc, 0 ◦mode(core) ◦ 1)
• a predicate indicating whether the page fault on fetch occurs for a the given walk and
translation request
pff ≡ pff(c, wI) def≡ mode(core) ∧ fault(c.m, trqI, wI)
• the physical memory address for instruction fetch, meaningful in user mode only in case
of no page fault on fetch
pmaI ≡ pmaI(c, wI) def≡
{
pma(wI , core.pc) : mode(core)
core.pc : ¬mode(core)
70
3.2 Single Core MIPS-86 ISA
• the instruction fetched from the memory in the absence of the page fault on fetch
I ≡ I(c, wI) def≡ ms4(c)(pmaI)
• the translation request for data access at the effective address
trqD ≡ trqD(core, I) def≡ (asid(core), ea(core, I), store(I) ◦mode(core) ◦ 0)
• a predicate indicating the page fault on load/store for the given walks and translation
request
pfls ≡ pfls(c, wD, wI , I) def≡ mode(core) ∧ ¬pff ∧mem(I) ∧ fault(c.m, trqD,wD)
• the physical memory address for the data access relevant if the instruction can be fetched
successfully and no page fault on load/store occurs
pmaD ≡ pmaD(c, wD, I) def≡
{
pma(wD, ea(core, I)) : mode(core)
ea(core, I) : ¬mode(core)
• data read from the memory system in case of lw or cas instructions
R ≡ R(c, wD, I) def≡ ms4(c)(pmaD)
• shorthands for jirs and reset
jisr ≡ jisr(core, I, eev, pff, pfls) reset ≡ eev[0]
Note that the fetched instruction and data read from the memory system are ignored in cer-
tain cases of the processor core step. For example, when the instruction misalignment exception
is raised, the instruction is irrelevant in the core transition function.
Transition Conditions:
1. in translated mode the walk wI must be a walk from TLB and must match the translation
request for instruction fetch
mode(core) =⇒ wI ∈ c.cpu.tlb ∧ hit(trqI, wI)
2. in translated mode the walk wI must be complete in case the page fault on fetch is not
raised
mode(core) ∧ ¬pff =⇒ complete(wI)
3. if in translated mode the successfully fetched instruction is an instruction accessing the
memory system then the matching walk wD required for data access must be present in
the TLB
mode(core) ∧ ¬pff ∧mem(I) =⇒
wD ∈ c.cpu.tlb ∧ hit(trqD,wD)
71
3 Formal Model of MIPS-86
4. this matching walk wD must be complete in case the page fault on load/store does not
appear
mode(core) ∧ ¬pff ∧mem(I) ∧ ¬pfls =⇒
complete(wD)
5. the compare-and-swap, locked write, or fence instruction can only be executed when the
store buffer is empty
cas(I) ∨ locksw(I) ∨mfence(I) =⇒ c.cpu.sb = ε
6. the store buffer must be flushed before the core switches from system to user mode
¬mode(core) ∧ emode(core) ∧ eret(I) =⇒ c.cpu.sb = ε
7. the store buffer must be also empty in case of interrupt in user mode
mode(core) ∧ jisr =⇒ c.cpu.sb = ε
8. we have the same requirement for the store buffer in case of reset in any mode
reset =⇒ c.cpu.sb = ε
Transition Effect:
• processor core
c′.cpu.core = δcore(c, I, R, eev, pff, pfls)
• store buffer
c′.cpu.sb =
{
(pmaD[31 : 2], sv(core, I)) ◦ c.cpu.sb : ¬jisr ∧ sw(I)
c.cpu.sb : otherwise
• translation lookaside buffer
First, we consider the effect of invlpg(I) which invalidates a single virtual page address
and drops all incomplete walks:
asid ≡ core.gpr(rs(I))[5 : 0]
vpa ≡ core.gpr(rd(I)).vpa
tlb′ ≡ δtlb(c.cpu.tlb, (flush, asid, vpa))
tlb′′ ≡ δtlb(tlb′,flush-incomplete)
Then the effect on the core transition on the TLB is defined as follows
c′.cpu.tlb =

∅ : ¬jisr ∧ flush(I)∨
reset
tlb′′ : ¬jisr ∧ invlpg(I)
δtlb(c.cpu.tlb, (flush, asid(core), core.pc.vpa)) : pff ∧ ¬reset
δtlb(c.cpu.tlb, (flush, asid(core), ea(core, I).vpa)) : ¬pff ∧ pfls∧
¬reset
c.cpu.tlb : otherwise
72
3.2 Single Core MIPS-86 ISA
If there is a page fault detected by the TLB, it reacts by flushing all walks for the given
virtual page address. This operation allows the TLB to rewalk the page table after the
corresponding interrupt handling by the kernel/hypervisor and not to generate the page
faults repeatedly for the old walks. Note also that the TLB might make such a flush also
when the core handles an interrupt of the higher priority and in fact is not interested in
the results of the address translation.
• memory
c′.m =

δm(c.m, (pmaD, sv(core, I))) : ¬jisr ∧ locksw(I)
δm(c.m, (core.gpr(rd(I)), pmaD, sv(core, I))) : ¬jisr ∧ cas(I)
c.m : otherwise
3.2.5.2 Store Buffer Step
During this step the memory write leaves the store buffer. The store buffer step is indicated by
the input
in = sb
Transition Conditions:
The store buffer can only make a step if it is not empty: c.cpu.sb 6= ε.
Transition Effect:
• store buffer
sblen ≡ |c.cpu.sb|
c′.cpu.sb = c.cpu.sb[1 : sblen− 1]
• memory
sbe ≡ c.cpu.sb[sblen]
c′.m = δm(c.m, (sbe.a ◦ 00, sbe.v))
3.2.5.3 Translation Lookaside Buffer Step
Walk Creation This active step of TLB creates a new walk for a given virtual page address
vpa and is indicated by the transition function input
in = (tlb-create, vpa)
Transition Conditions:
The TLB will only create a new walk when the processor core is running in translated mode
mode(core)
Transition Effect:
w′ ≡ winit(vpa, core.spr(pto), asid(core))
c′.cpu.tlb = δtlb(c.cpu.tlb, (add-walk, w′))
73
3 Formal Model of MIPS-86
Setting Accessed Bit The accessed bit of the page table entry required for extension of a
walk w is set by the active step of the TLB if the transition function input is
in = (add-accessed, w)
Transition Conditions:
1. the page table entry flag can only be set in translated mode
mode(core)
2. the TLB sets the accessed bit only for incomplete walks present in the TLB and corre-
sponding to the current address space identifier
w ∈ c.cpu.tlb ∧ ¬complete(w) ∧ w.asid = asid(core)
3. the accessed bit can only be set for a page table entry marked as present
pte(c.m,w).p
Transition Effect:
c′.m = δm(c.m, (ptea(w), seta(pte(c.m,w))))
Walk Extension The TLB extends an existing walk w provided in the input
in = (add-extend, w)
Transition Conditions:
1. walk extension is performed only in translated mode
mode(core)
2. the TLB extends only incomplete walks present in the TLB and corresponding to the cur-
rent address space identifier
w ∈ c.cpu.tlb ∧ ¬complete(w) ∧ w.asid = asid(core)
3. the access flag of the page table entry used for walk extension is set
pte(c.m,w).a
4. the extended walk w′ ≡ wext(w, pte(c.m,w), 000) is not faulty
¬w′.fault
Transition Effect:
c′.cpu.tlb = δtlb(c.cpu.tlb, (add-walk, w′))
74
3.3 Multi-Core MIPS-86
3.3 Multi-Core MIPS-86
Definition 3.63 (Configuration of Multi-Core MIPS-86). For a number of processors np ∈ N
we define the configuration of the multi-core MIPS-86 machine:
CMMIPS
def≡ (cpu : Nnp → Cproc,m ∈ Cm)
where cpu is a function mapping processor index to its configuration, and m is shared memory.
Definition 3.64 (Multi-Core MIPS-86 Transition Function). Transitions of the multi-core ma-
chine are described by the function
δMMIPS : CMMIPS × Nnp × ΣMIPS ⇀ CMMIPS
such that for a configuration mc ∈ CMMIPS, an index p ∈ Nnp of a processor performing the
step, and its proper input in ∈ ΣMIPS, the result of the transition is defined as
δMMIPS(mc, p, in)
def≡
{(
mc.cpu[p 7→ c′p.cpu], c′p.m
)
: δMIPS((mc.cpu(p),mc.m), in) = c
′
p
undefined : otherwise
75

4
Store Buffer Reduced
MIPS-86 in the Context of
Hypervisor/OS Kernels
Hypervisors or operating system kernels are usually written in mixed languages (e.g. C-like
language and macro assembly with inline assembly portions). On a multiprocessor system
their compiled code runs in system mode in parallel with the compiled code of guest operating
systems or processes being executed in user mode, and must guarantee their proper virtualiza-
tion.
To argue about hypervisor/kernel correctness we have to apply sequential compiler correct-
ness in the context of a multiprocessor system. However, compilers usually guarantee correct-
ness of executed code on the sequentially consistent memory which is not simply given for free
in the presence of store buffers. Since in this work we are interested in the verification of the
code running in system mode and can argue only about properties of this code, we have to re-
duce the ISA model introduced in Chapter 3 to a model where the store buffers of the processors
executing our kernel or hypervisor are invisible under certain conditions. For the guest/pro-
cess steps we will provide all components that are needed for further virtualization of guest
memory as it is shown by Kovalev for a restricted version of the full x86 architecture [Deg11] in
his doctoral work [Kov13].
As we have seen in Section 2.3 for a given unit we can apply the sequential simulation relation
for the part of the memory not locally owned by other units. The ownership safety will guaran-
tee that the unit does not access other units’ locally owned addresses. Therefore, to be able to
apply the compiler correctness on the sequentially consistent memory, we have to perform the
store buffer reduction exactly for addresses involved in the simulation relation.
In contrast to the hypervisor/kernel code we are dealing with, the memory access and own-
ership transfer policies used by the guests or processes are unknown to the hypervisor, and we
cannot rely on them. Therefore, we have to consider two distinct sets of memory addresses
for the hypervisor/kernel and the guests/processes respectively, and introduce the ownership
model only from the hypervisor/kernel point of view.
However, the hypervisor might need to access the memory of the guests to guarantee the
desired virtualization. Without safety policy coming from the guests we cannot reduce store
buffers for guest steps, and the hypervisor running on one of the processors can see only in the
memory the data belonging to a guest being executed on other processors though the actual val-
ues might be in their store buffers. This affects the treatment of such addresses in the compiler
consistency in a way where only the values from the memory are coupled. A reasonable way
suggested in [Kov13] for such hypervisor write accesses is to use the atomic compare-and-swap
or locked memory write flushing the store buffer and not changing the view of guests on their
memory.
On the other hand, it is clear that the guests should not access the memory part of the hy-
77
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
pervisor in order not to influence its execution as well as the treatment of its memory as se-
quential consistent. The only exception are the MMU steps during guest execution that might
use a shadow page table built by the hypervisor and belonging to its address space. If shadow
page tables are not shared and are individual for every processor as it is considered in [Kov13],
there cannot be any problem since our MIPS-86 ISA model guarantees that the store buffer is
flushed when we switch from system to user mode. Any stores in the SB will become visible
for the MMU acting during the guest steps on the same processor. A more complicated case
is when an SPT is shared for a guest between the processors. The hypervisor needs to make it
also shared in its ownership model, and the sequential consistency of a hypervisor portion of
memory occupied by this SPT might be desirable for the guest steps.
Now, as follows from the discussion above, for each processor of the multi-core MIPS-86
model, we need to have the sequentially consistent memory for owned (by a processor), read-
only and shared hypervisor/kernel addresses. A straightforward way to abstract the store
buffers is to use a simple discipline from [Kov13] guaranteeing that at any time there are no
pending writes to the same physical memory address in more than one store buffer of the
multi-core machine, and writes to the shared data never go into store buffers. Obviously, for
the owned non-shared addresses it will easily follow from our ownership safety policy.
However, in contrast to Kovalev’s work, in our ownership model the shared addresses are not
statically fixed: a unit may acquire the ownership on them, make them local as well as release
them. When a processor having in its store buffer a write to some local non-shared addresses
releases these addresses without writing them back to the physical memory, it will break the
consistency relation for other units because they will need to cover these addresses in their
simulation relations. Therefore, we will allow the ownership transfer only at hypervisor/kernel
memory accesses flushing the store buffer.
Such a discipline is quite strict in comparison to [CS10] where the authors introduced a more
general model and suggested flushing the store buffer only before a shared read in case there
were writes to shared data after the last flush. However, application of their theory for our
MIPS-86 model and as a consequence for semantics of upper layers of the verification stack is
tricky enough and considered in the doctoral thesis [Che16] of Geng Chen where he extended
the model with memory management unit steps [CCK14].
In the frame of this work, we will stick to the simpler approach introduced above, which
we find enough for the correctness of the model of software threads. We can also easily treat
the MMU steps in the store buffer reduction because as one can see from Chapter 3, there are
no MMU transitions in system mode what perfectly matches the algorithm with non-shared
shadow page tables. Moreover, we aim at the application of the general theory from Chapter 2
for all layers of verification stack in this work. The simpler store buffer reduction can be done
with help of the Cosmos model simulation theorem, which in turn will show its first full power
for correctness proofs.
Note that this chapter does not pretend to introduce a new store buffer reduction strategy,
but rather serves a few purposes needed for the pervasive verification of the software threads
considered in this work: (i) the application of the store buffer reduction from [Kov13] for our
multi-core MIPS-86 machine and the ownership model from Chapter 2, (ii) determining and
checking the software conditions and safety policy enabling the SB reduction for MIPS-86 in
the presence of guest/user steps, (iii) working out definitions for the instantiation of the Cosmos
model with the multi-core MIPS-86 machine, and (iv) the detailed application of the general
concurrent simulation theorem for the bottom layer of the verification stack.
We proceed as follows. First, we introduce the store buffer reduced machine simulating our
reference MIPS-86 machine. In fact, we will use the same type of MIPS-86 configuration defined
before, in which we will require the store buffer to be empty when the processor is stepping in
system mode. The only difference is the transition function that will not write to the store
78
4.1 Store Buffer Reduced MIPS-86
buffer in this case. Next, we will prove that such a reduced machine is encoded by the steps of
the reference (non-reduced) machine under the ownership safety and software conditions. For
this purpose we will apply directly the theory of Cosmos model and simulation in concurrency
by instantiating it with our reduced and reference MIPS-86 machines, and then formulating the
sequential simulation relation wrt. the ideas described above and proving the corresponding
assumptions.
4.1 Store Buffer Reduced MIPS-86
As we have mentioned above for the store buffer reduced single core and multi-core MIPS-
86 models we use the same configurations defined before, namely CMIPS and CMMIPS. The
definitions of the corresponding transition functions, however, need to be extended.
Definition 4.1 (Reduced Single Core MIPS-86 Transition Function). Transitions of the reduced
single core machine are defined by the function
δrMIPS : CMIPS × ΣMIPS ⇀ CMIPS
such that
δrMIPS(c, in)
def≡ c′
differs from the transition function δMIPS(c, in) only for the processor core steps determined by
in = (core, wI , wD, eev). The processor core makes a step under the same conditions, however,
the effect of the transition on its store buffer and the memory is defined as:
c′.cpu.sb =
{
(pmaD[31 : 2], sv(core, I)) ◦ c.cpu.sb : ¬jisr ∧ sw(I) ∧mode(core)
c.cpu.sb : otherwise
c′.m =

δm(c.m, (pmaD, sv(core, I))) : ¬jisr ∧ (locksw(I)∨
sw(I) ∧ ¬mode(core))
δm(c.m, (core.gpr(rd(I)), pmaD, sv(core, I))) : ¬jisr ∧ cas(I)
c.m : otherwise
The definition for all other components of c′ is the same as in Section 3.2.5.
Note, that since we do not write to the store buffer in system mode, and it is flushed if the
mode is changed (see Section 3.2.5.1), the initially empty store buffer will remain empty in
system mode and will not be able to make active steps in this case.
Definition 4.2 (Reduced Multi-Core MIPS-86 Transition Function). Transitions of the reduced
multicore machine are described by the function
δrMMIPS : CMMIPS × Nnp × ΣMIPS ⇀ CMMIPS
such that for a configuration mc ∈ CMMIPS, an index of a processor p ∈ Nnp performing the
step, and its proper input in ∈ ΣMIPS, the result of the transition is defined as
δrMMIPS(mc, p, in)
def≡
{(
mc.cpu[p 7→ c′p.cpu], c′p.m
)
: δrMIPS((mc.cpu(p),mc.m), in) = c
′
p
undefined : otherwise
79
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
4.2 Memory Address Space in Hypervisor Context
Since in the scope of this work we consider the hypervisor/OS kernel written in the aforemen-
tioned programming languages, we can define what exactly the memory address space of the
running system includes wrt. the system in question. We split the memory into portions pro-
vided as instantiation parameter of the model we consider. Later on, when we talk about exact
models determined by a code to be executed, we will instantiate these parameters correspond-
ingly.
We distinguish the following fixed sets of addresses partitioning the MIPS-86 memory space
in the context of the hypervisor/kernel running atop of the MIPS-86 multicore machine:
• a set of hypervisor or OS kernel addresses Ahyp ⊆ B32,
• a set of addresses of all guests/processes Aguest ⊂ B32 such that
Ahyp ∪Aguest = B32, Ahyp ∩Aguest = ∅,
• among hypervisor/kernel addresses we differentiate a code regionAcode, sets of addresses
occupied by constants Aconst and other data addresses Adata that can be accessed by read
and write operations:
Ahyp = Acode ∪Aconst ∪Adata
Note, that all these sets are fixed and pairwise disjoint:
∀X,Y ∈ {Acode, Aconst, Adata} . X 6= Y =⇒ X ∩ Y = ∅
4.3 MIPS-86 Cosmos Model Instantiations
4.3.1 Reference Machine Instantiation
For the reference non-reduced multi-core MIPS-86 machine we define an instantiation SMIPS ∈
S of the Cosmos model containing np ∈ N processors.
• SMIPS.A = B32, SMIPS.V = B8 – The sets of addresses and values correspond to the type
of the global byte-addressable memory.
• SMIPS.R = Acode ∪ Aconst – The read-only addresses are defined by the corresponding
areas occupied by the translated hypervisor/kernel code and its variables annotated as
constants.
• SMIPS.nu = np – As the number of computational units we consider the number of pro-
cessors np ∈ N in the multicore MIPS-86 model.
• SMIPS.U = Cproc – The configuration of the computation unit is the MIPS-86 processor
configuration which includes the processor core, store buffer and TLB.
• SMIPS.E = ΣMIPS – We treat the MIPS-86 processor inputs in external inputs of the Cosmos
model unit. The input as shown before determines which component of the processor
makes a step and provides the corresponding component inputs.
• SMIPS.reads : U × (A → V) × E → 2A – The set of memory addresses read during a step
depends on which processor component is involved into the transition, and in case of the
core step it is determined by an instruction to be executed.
80
4.3 MIPS-86 Cosmos Model Instantiations
Since we consider the byte addressable memory as well as byte addresses for the safety
policy, and in the MIPS-86 model support only word accesses to the memory, we will use
a shorthand computing a set of 4 byte addresses from a byte address for a word access.
Generally, for a byte address a ∈ B32 and a number d ∈ N of consecutive bytes to be
accessed we refer to {a}d from Definition 1.24 for this purpose.
For the property instar(SMIPS) (given in Definition 2.2) to hold one has to include into the
reads-set all addresses to be read during one step transition of the MIPS-86 model.
For a MIPS-86 configuration c ∈ CMIPS, core ≡ c.cpu.core, and an input in ∈ ΣMIPS such
that we have in = (core, wI , wD, eev) and the transition function δMIPS(c, in) is defined
we first introduce separately sets of addresses AF (c, in) and AR(c, in) needed for the
instruction fetch and data read from the memory respectively in the absence of interrupts.
AF (c, in)
def≡ {pmaI(c, wI)}4
For the core step the set of data addresses to be read from the memory can only be defined
if the instruction I ≡ I(c, wI) is fetched successfully, meaning there were no exceptions
raised before we get interested in decoding of the fetched instruction. This set is not empty
in case of the compare-and-swap or load word instructions.
AR(c, in)
def≡
{
{pmaD(c, wD, I)}4 : load(I)
∅ : otherwise
Now, depending on the interrupt handled by the core we define a set readscore-mem(c, in)
computing instruction and data memory addresses needed for the core step without tak-
ing into account accesses to the page table for the address translation. For example, in case
of an external interrupt, instruction address misalignment, or a page fault on fetch accord-
ing to the MIPS-86 ISA model semantics we are not interested in any data fetched/read
from the memory. In the absence of any interrupts and depending whether the instruc-
tion reads from the memory, the reads-set includes the fetch and read addresses defined
above. Using the shorthand il ≡ il(core, I, eev, pff(c, wI), pfls(c, wD, I)), we compute
readscore-mem(c, in) in the following way
readscore-mem(c, in)
def≡

∅ : il ∈ {reset, I/O, pff}∨
il = mal ∧ imal(core)
AF (c, in) : il = mal ∧ dmal(core, I) ∨ il = pfls
AF (c, in) ∪AR(c, in) : otherwise
Note here, that there is a difference between pff and pff(c, wI). The former denotes the
synonym pff ≡ 4 earlier introduced in Table 3.5 for the corresponding interrupt level and
indicates the internal interrupt event signal mutual exclusive in relation to the other iev.
Its computation uses pff(c, wI) coming from the MMU.
As we know from the semantics of the MIPS-86 model, during a core step in user mode
we always try to extend a matching walk if it is incomplete. Therefore, we also define sets
of byte addresses readscore-fpte(c, in) and readscore-dpte(c, in) for reading required page
table entries for instruction fetch and data access respectively. For shortness we make the
definitions under conditions following from the MIPS-86 semantics. For any other cases
not satisfying them the sets are obviously empty.
mode(core) ∧ wI ∈ c.cpu.tlb ∧ hit (trqI(core), wI) ∧ ¬complete(wI) =⇒
readscore-fpte(c, in)
def≡ {ptea(wI)}4
81
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
mode(core) ∧ wI ∈ c.cpu.tlb ∧ hit (trqI(core), wI) ∧ complete(wI)∧
mem(I) ∧ wD ∈ c.cpu.tlb ∧ hit (trqD(core, I), wD) ∧ ¬complete(wI) =⇒
readscore-dpte(c, in)
def≡ {ptea(wD)}4
Putting it all together we easily get the set of all byte addresses to be read during the
processor core step with the input in:
readscore(c, in)
def≡ readscore-fpte(c, in) ∪ readscore-dpte(c, in) ∪ readscore-mem(c, in)
For a TLB step indicated by in = (add-accessed, w) or in = (add-extend, w) we
introduce
readstlb(in)
def≡ {ptea(w)}4
Now, depending on the input in showing which processor component makes a step, we
define the reads-set for the MIPS-86 model.
readsMIPS(c, in)
def≡

readscore(c, in) : in = (core, wI , wD, eev)
readstlb(in) : in = (add-accessed, w)∨
in = (add-extend, w)
∅ : otherwise
As the instantiation for the Cosmos model with MIPS-86 we set
SMIPS.reads(u,m, in) = readsMIPS((u,m), in)
• SMIPS.δ : U × (A⇀ V)×E ⇀ U × (A⇀ V) – The instantiation of the transition function is
based on the single core MIPS-86 transition function. As follows from the definition, the
Cosmos model transitions affects the unit’s configuration and the portion of memory to be
written.
We define the set of addresses to be written from the point of view of the processor. Since
the stores with sw go to the processor’s store buffer, they do not appear in the memory
component. However, to be able to apply the safety policy explicitly for addresses written
to the store buffer we include these addresses into the aforementioned set. This will be also
the case for the reduced machine where any writes of the hypervisor/kernel are applied
for the abstract memory without the store buffer.
For in = (core, wI , wD, eev), c ∈ CMIPS, as well as shorthands I ≡ I(c, wI), core ≡
c.cpu.core, and jisr ≡ jisr(core, I, eev, pff(c, wI), pfls(c, wD, I)) we introduce predicates
swap(c, in) indicating if the cas-instruction writes to the memory, and wr(c, in) showing
that the core performs a write operation to the memory or store buffer:
swap(c, in)
def≡ cas(I) ∧ core.gpr(rd(I)) = c.m4(pmaD(c, wD, I))
wr(c, in)
def≡ ¬jisr ∧ (sw(I) ∨ locksw(I) ∨ swap(c, in))
writescore(c, in)
def≡
{
{pmaD(c, wD, I)}4 : wr(c, in)
∅ : otherwise
82
4.3 MIPS-86 Cosmos Model Instantiations
For a TLB step indicated by in = (add-accessed, w) we introduce
writestlb(in)
def≡ {ptea(w))}4
In case of a store buffer step we have
writessb(c)
def≡ {c.cpu.sb[|c.cpu.sb|].a ◦ 00}4
So, the writes-set for the MIPS-86 model is then defined in the following way:
writesMIPS(c, in)
def≡

writescore(c, in) : in = (core, wI , wD, eev)
writestlb(in) : in = (add-accessed, w)
writessb(c) : in = sb
∅ : otherwise
Note that the definition will be used only when the transition function is defined.
To instantiate the unit transition function for the Cosmos model, we need to transform
the partial memory m : B32 ⇀ B8 defined by the reads-set to the type Cm that is a total
mapping. Such a memory dme maps addresses outside of the reads-set to some dummy
values, which are not accessed if the reads-set is instantiated properly and are introduced
only to match the formal definitions.
Now, depending whether the single core MIPS-86 transition is defined or not, we instan-
tiate the Cosmos model unit transition function as follows
SMIPS.δ(u,m, in) =
{(
u′,m′|writesMIPS((u,dme),in)
)
: δMIPS((u, dme), in) = (u′,m′)
undefined : otherwise
• SMIPS.IP : U×(R → V)×E → B – Since for the store buffer reduction we will provide the
simulation relation for every step and do not apply the order reduction we will consider
as an interleaving point any MIPS-86 configuration from which the transition exists for
a given input. For simplicity, we set the predicate to true for any input parameters of
the transition because we are interested in the interleaving points only in cases of the
transition step existence.
SMIPS.IP(u,m, in) = 1
• SMIPS.IO : U × (R → V) × E → B – The instantiation of IO-points depends on whether
a guest or the hypervisor/kernel makes a step.
For our simple store buffer reduction discipline we do not allow a shared write of the
hypervisor/kernel to be in the SB. Therefore, we treat the instruction sw in system mode
as a local step.
As for the guest/user steps we will consider any processor step in user mode as suitable
for IO-operation without looking at the exact operation performed by the processor. In
fact, definition of the IO-points must depend on the read-only memory, what is not true
from the hypervisor point of view because of the address translation. Therefore, techni-
cally in the Cosmos model we cannot identify the memory instruction performed by the
core in user mode.
Moreover, in the work later when we will apply the order reduction for the hypervisor/k-
ernel, we will treat all guest/user steps in the same manner and preserve their order in the
83
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
global schedule. The reason is that safety policies of guest/users are in general unknown
for us.
Sticking to the previously used shorthands we define
IOMIPS(c, in) def≡ ¬mode(core) ∧ in = (core, wI , wD, eev)∧
¬jisr ∧mem(I) ∧ ¬sw(I)∨
mode(core)
SMIPS.IO(u,m, in) = IOMIPS((u, dmeB32), in)
Note here that extending the memory with dummy values does not influence the predi-
cate.
• SMIPS.OT : U × (R → V)× E → B – We allow to transfer the ownership only at hypervi-
sor/kernel operations flushing the store buffer so that data at released addresses becomes
available for other processors.
OT MIPS(c, in) def≡ ¬mode(core)∧ in = (core, wI , wD, eev)∧¬jisr ∧ (cas(I)∨ locksw(I))
SMIPS.OT (u,m, in) = OT MIPS((u, dmeB32), in)
This instantiation trivially guarantees that the property instaot(SMIPS) is not violated.
4.3.2 Reduced Machine Instantiation
For the store buffer reduced MIPS-86 machine the instantiation of the Cosmos model is similar
except those parameters that are based on the transition function. We denote this Cosmos model
as SrMIPS ∈ S and consider its signature.
• For components X ∈ {A,V,R,nu,U , E , reads, IP, IO,OT } the instantiation is equal to
the one for the reference MIPS-86
SrMIPS.X = SMIPS.X
• The instantiation of δ is now based on the transition function of the store buffer reduced
single core MIPS-86 machine:
SrMIPS.δ(u,m, in) =
{(
u′,m′|writesMIPS((u,dme),in)
)
: δrMIPS((u, dme), in) = (u′,m′)
undefined : otherwise
As we have mentioned above, we use here the same definitions of the reads- and writes
sets as for the reference MIPS-86 machine.
4.4 Store Buffer Reduction Correctness
Now we can justify the SB reduced multi-core MIPS-86 model by showing the existence of its
steps encoded by any steps of the reference machine under certain conditions. We proceed
exactly in the way introduced for the concurrent simulation in Chapter 2.
84
4.4 Store Buffer Reduction Correctness
4.4.1 Simulation for the Single Core Machine
First, we define the simulation relation for the store buffer reduction from the point of view of
a singe processor in the model and trivially prove the simulation for any step of the single core
machine. This case corresponds to the sequential simulation theorem for the Cosmos machine
where the ownership is not considered yet.
Definition 4.3 (Single Core MIPS-86 Simulation Relation for SB Reduction). For c, cr ∈ CMIPS
such that c is a configuration of the reference single core MIPS-86 machine and cr - the store
buffer reduced machine, and a subset set of byte addresses icm ⊂ Ahyp representing a pos-
sible inconsistent memory region of the hypervisor/kernel, we define the simulation relation
simrMIPS(c, cr, bm) between two machines denoting that
• the configurations of the cores and TLBs of both machines are equal,
• the SB of the reduced machine is empty in system mode and is equal to the SB of the
reference machine in user mode,
• the guests/users memory region of the reduced machine is equal to this region in the
physical memory of the reference machine,
• the content of the hypervisor/kernel memory excluding a possible inconsistent region
resides in the SB or the physical memory of the reference machine in system mode, and
only in the physical memory in user mode.
simrMIPS(c, cr, icm)
def≡
(i) cr.cpu.core = c.cpu.core
(ii) cr.cpu.tlb = c.cpu.tlb
(iii) cr.cpu.sb =
{
 : ¬mode(cr.cpu.core)
c.cpu.sb : otherwise
(iv) ∀a ∈ Aguest. cr.m(a) = c.m(a)
(v) ∀a ∈ Ahyp \ icm. cr.m(a) =
{
ms(c.cpu.sb, c.m)(a) : ¬mode(cr.cpu.core)
c.m(a) : otherwise
Note, that the last is important for the MMU steps when after interrupt handling the pro-
cessor switches back to user mode. Moreover, when we will use this simulation relation in the
concurrent setting, we will set icm to the locally owned addresses of other processors since the
memory values at such addresses might be in their store buffers.
We introduce also a technical invariant stating that there are no pending writes to icm and
guest/user memory in the store buffer of the processor in system mode.
Definition 4.4 (SB Invariant). For a processor configuration cpu ∈ Cproc and a set icm ⊂ Ahyp
we define
invsb(cpu, icm)
def≡ ¬mode(cpu.core) =⇒ ∀a ∈ icm ∪Aguest. ¬sbhit(cpu.sb, a[31 : 2])
Obviously, as one of conditions in the sequential simulation theorem we require that icm is
not accessed by the reduced machine.
85
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
Definition 4.5 (No Access to icm by Single Core Reduced Machine). For a step of the single
core reduced machine from a configuration cr ∈ CMIPS and determined by an input inr ∈ ΣMIPS
we define
noaccrMIPS(cr, inr, icm)
def≡ (readsMIPS(cr, inr) ∪ writesMIPS(cr, inr)) ∩ icm = ∅
Moreover, following the ideas from the discussion at the beginning of the chapter we formu-
late the software conditions that are enough for proving our simulation relation in the sequential
context. For the concurrent context later we will additionally rely on the ownership policy.
Definition 4.6 (Software Conditions for SB Reduction). For a step of the single core reduced
machine from a configuration cr ∈ CMIPS and determined by an input inr ∈ ΣMIPS we require
that in user mode the processor core and SB do not access the hypervisor/kernel addresses, and
the hypervisor/kernel is not allowed to write to the guest/user memory by sw instruction.
scrMIPS(cr, inr)
def≡ (i) mode(cr.cpu.core) ∧ inr = (core, wI , wD, eev) =⇒
readscore-mem(cr, inr) ∪ writescore(cr, inr) ⊂ Aguest
(ii) mode(cr.cpu.core) ∧ inr = sb =⇒ writessb(cr) ⊂ Aguest
(iii) ¬mode(cr.cpu.core) ∧ inr = (core, wI , wD, eev) ∧
writescore(cr, inr) ∩Aguest 6= ∅ =⇒ ¬sw(I(cr, wI))
The following theorem justifies our store buffer reduction discipline for any step of the single
core MIPS-86 and is needed for establishing the concurrent simulation. We also guarantee the
correspondence of the IO- and OT -steps between the reduced and reference machines accord-
ing to the requirement oneIO from Chapter 2.
Theorem 4.1 (SB Reduction for a Step of the Single Core MIPS-86).
∀c, c′, cr ∈ CMIPS, in ∈ ΣMIPS, icm ∈ 2B32 .
(i) icm ⊂ Ahyp ∧ simrMIPS(c, cr, icm) ∧ invsb(c.cpu, icm)
(ii) c′ = δMIPS(c, in)
(iii) ∀inr ∈ ΣMIPS, c′r ∈ CMIPS. c′r = δrMIPS(cr, inr) =⇒
noaccrMIPS(cr, inr, icm) ∧ scrMIPS(cr, inr)
=⇒
∃c′r ∈ CMIPS.
(i) c′r = cr ∧ ¬OT MIPS(c, in)∨
∃inr ∈ ΣMIPS. c′r = δrMIPS(cr, inr)∧
IOMIPS(cr, inr) =⇒ IOMIPS(c, in)∧
OT MIPS(c, in)⇐⇒ OT MIPS(cr, inr)
(ii) simrMIPS(c′, c′r, icm) ∧ invsb(c′.cpu, icm)
Proof: The theorem is proven by a case split on the step c′ = δMIPS(c, in) performed by the
processor of the reference machine in translated and untranslated modes. As icmwe choose any
set icm ⊂ Ahyp of addresses that cannot be covered by the simulation relation at the beginning
of the step and are not accessed during the step in the reduced machine.
86
4.4 Store Buffer Reduction Correctness
For the proof we introduce a few shorthands. Namely, for the configurations c, cr, c′, c′r and
processor components X ∈ {core, sb, tlb} we denote X ≡ c.cpu.X , Xr ≡ cr.cpu.X , X ′ ≡
c′.cpu.X , X ′r ≡ c′r.cpu.X .
Since in the starting configurations the cores of both machines are coupled, the modes are
also trivially equivalent mode(corer) = mode(core).
First, we consider a hypervisor/kernel step performed in untranslated mode on the reference
machine. From the semantics of the single core MIPS-86 model we know that only the SB and
core can make a step and in case of a core step the input walks are ignored.
1. Store buffer step: in = sb
The SB writes a word to the memory at the address a such that sbhit(c.cpu.sb, a[31 : 2])
holds. All other processor components are not changed. From invsb(c.cpu, icm) we know
a /∈ icm ∪ Aguest. Therefore, the guest memory in c′.m is not changed and the buffer
commits the store to the portion of the hypervisor/kernel memory covered by the simu-
lation relation, namely cr.m4(a) = ms4(sb, c.m). An empty step of the reduced machine
c′r = cr preserves the simulation relation and we have c′r.m4(a) = ms4(sb′, c′.m). More-
over, invsb(c′.cpu, icm) still holds because the SB contains now one pending store less.
2. Core step: in = (core, wI , wD, eev)
Along with the step of the reference machine we consider the core step of the reduced
machine with the same input inr = in and c′r = δrMIPS(cr, inr).
In case of an external interrupt or the instruction misalignment no memory access is per-
formed and the cores of both machines make the same non-empty step (which is also not
an IO-step) preserving the simulation relation. Note that by reset the TLBs are flushed
and become empty. Moreover, in the reference machine for the reset to be handled by the
core we require the SB to be already empty, meaning that the SB must have made its steps
if it had pending stores.
Otherwise, the processors read the same instruction I = ms4(c)(core.pc) = m4(cr)(corer.pc)
from the consistent portion of memory what follows from the simulation relation and the
predicate noaccrMIPS(cr, inr, icm).
If we have¬mem(I) ormem(I)∧dmal(core, I), the instruction does not read/write from/to
the memory or the store buffer, and the cores perform the equivalent step. The TLBs are
flushed in the same manner for flush and invlpg instructions. In case of the mfence in-
struction the core of the reference machine makes the step only if the SB has written back
its pending stores. In the reduced machine the execution of mfence affects only the core.
Therefore for the aforementioned cases the theorem trivially holds.
If we have ¬mode(core) ∧ emode(core) ∧ eret(I), then the requirement on the store buffer
of the reference machine is the same as for mfence, sb = ε, and one obviously has ∀a ∈
Ahyp\icm. mr(a) = m(a). On the other hand, in the reduced machine from the simulation
relation we also have sbr = ε, concluding sb′r = sb′ = ε. Both of the cores are in user mode
in c′ and c′r, the coupling relation as well as the theorem hold.
For other instructions I we have to distinguish between lw, sw, cas, and locksw:
• data word read lw(I)
The reference machine reads a word R = ms4(c)(ea(core, I)) from the memory or
the store buffer. The reduced machine gets also the data Rr = cr.m(ea(corer, I)) at
the same address ea(corer, I) = ea(core, I) because the cores are equivalent. From
noaccrMIPS(cr, inr, icm) and the simulation relation we conclude that both machines
read the same dataR = Rr from the consistent memory. Therefore, the cores perform
the same step leading to the consistent configurations, s.t simrMIPS(c′, c′r, icm) holds.
87
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
The configuration of the store buffer does not change and support the invariant
invsb(c
′.cpu, icm). Moreover, for lw(I) we have IOMIPS(c, in) = IOMIPS(cr, inr) = 1
and OT MIPS(c, in) = OT MIPS(cr, inr) = 0, what concludes the claim.
• data word write sw(I)
As argued above in both machines the computed effective addresses are equal and
out of icm. From the software condition (iii) we know that the hypervisor/kernel
does not write to the guest memory by sw(I). Therefore, both machines write the
data word at the byte addresses belonging to the consistent portion Ahyp \ icm. In
the reference machine the write goes to the store buffer preserving invsb(c′.cpu, icm).
The reduced machine writes the same word to its memory. Therefore, one easily
concludes the memory consistency ∀a ∈ Ahyp \ icm. c′r.m(a) = ms(sb′, c′.m)(a).
Moreover, one has IOMIPS(c, in) = IOMIPS(cr, inr) = 0. The theorem holds for
this case.
• locked write locksw(I)
In comparison to sw, the execution of locksw requires the store buffer of the refer-
ence machine to be empty and writes directly to the memory in both machines. As
in the previous case the machines store the same word to the consistent portion of
the memory, however, including in this case not only the hypervisor addresses but
also the guest region. Since the SB of the reference machine does not receive a new
pending store, the theorem obviously holds.
• compare-and-swap cas(I)
Analogously to the previous step, the core step for cas(I) in the reference machine
is made when its SB is empty. After the transition one still has sb′r = sb′ = ε. In
contrast to the previously considered lw the read result R is ignored by the core, and
the operation is performed atomically by the memory, namely
c′.m = δm(c.m, (core.gpr(rd(I)), ea(core, I), sv(core, I))).
From the software conditions and noaccrMIPS(cr, inr, icm) we already know that
the byte addresses to be read and written belong to the guest or hypervisor mem-
ory consistent for the machines and excluding icm. Therefore, it is easy to con-
clude that the memory transition in both machines is the same, and the relation
simrMIPS(c
′, c′r, icm) holds. Since the SB is not changed, its invariant is preserved.
Additionally, the cas operations is considered to be an IO-step suitable for the own-
ership transfer.
This last case finishes the proof of the theorem for a hypervisor/kernel step.
Now, we consider a guest/process step in translated mode. In this case any processor com-
ponent can perform a transition depending on the input in and the machine’s semantics. Since
we mark any guest/process step as an IO-point even in the case of a non-memory opera-
tion, and no such steps are suitable for the ownership transfer, we have only to prove that
the reduced machine makes also a corresponding non-empty step in translated mode, namely
∃inr ∈ ΣMIPS. c′r = δrMIPS(cr, inr) such that simrMIPS(c′, c′r, icm) holds.
From the first look, both machines are equal and should preserve the simulation relation if
inr = in. However, one has to argue about the equivalence of data read and the instruction
fetched from the memory and show that the memory consistency is not violated. Not to miss
the details we consider the following cases:
1. Walk creation: in = (tlb-create, va).
The TLB of the reference machine creates an initial walk for a chosen virtual address va
88
4.4 Store Buffer Reduction Correctness
and no memory operation is performed. Since the TLBs of both machines are equal before
and after the step, the theorem holds.
2. Setting the accessed bit: in = (add-accessed, w).
From the reference machine transition function this step is defined when the conditions
w ∈ c.cpu.tlb ∧ ¬complete(w) ∧ w.asid = asid(c.cpu.core) hold and the page table entry is
present: pte(c.m,w).p. For the reduced machine to perform the same step one has to show
pte(cr.m,w) = pte(c.m,w). This is true indeed, because from noaccrMIPS(cr, inr, icm) we
know that the accessed pte resides in the guest or hypervisor memory covered by the
consistency relation. So, both machines perform the same transition setting the bit in the
memory and the claims hold.
3. Walk extension: in = (add-extend, w)
Additionally to the conditions of the previous case, the accessed flag of the page table
entry must be set pte(c.m).a and an extended walk w′ = wext(w, pte(c.m,w), 000) must be
not faulty ¬w′.fault. As shown above, the read pte is the same for the machines, therefore,
they make the same step preserving the relation.
4. Store buffer step: in = sb
In case of a store buffer transition of the non-reduced machine the reduced machine makes
the same step. From the software condition (ii) the write of the memory is made at an
address belonging to the guest. Therefore, the simulation holds also after the step.
5. Core step: in = (core, wI , wD, eev)
For a core step in translated mode one has to consider the address translation for the
instruction address, and for the effective address if the instruction address translation
succeeds and the fetched instruction accesses the memory.
If the walk wI present in the TLB and matching the translation request is incomplete,
similarly to the case of the TLB setting accessed bit one can show that in both machines
the TLB reads the same page table entry pte(cr.m,wI) = pte(c.m,wI).
Therefore, for the complete or incomplete walk wI one easily concludes
pff = pff(cr, wI) = pff(c, wI).
Now, for the complete walk wI we calculate the same translated address pmaI(cr, wI) =
pmaI(c, wI) which according to the software condition (i) refers to the instruction in the
guest memory. From the simulation relation we then get I = I(cr, wI) = I(c, wI).
Again, as above, for the complete wI and incomplete wD present in the TLB and matching
the data translation request we get pte(cr.m,wD) = pte(c.m,wD) and
pfls = pfls(cr, wD, wI , I) = pfls(c, wD, wI , I).
The physical address pmaD(cr, wD, I) = pmaD(c, wD, I) computed in case of complete(wD)
analogously points to a word residing in the guest memory. Hence, the word accessed at
this address is equal in both of the machines.
Now, we consider how according to the semantics the components of the single core
MIPS-86 model are changed during the core step. First, we pay attention on the TLB.
For the incomplete walk wI with the faulty extension the TLBs of the both machines are
flushed in the same way independently from the reaction of the core. In turn, the walk
wD is ignored by the TLB. Therefore, we get tlb′r = tlb′.
89
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
IfwI is complete andmem(I) holds, but the incompletewD causes a faulty extension, then
the TLBs are flushed similarly for the effective address and are coupled.
The transition of the processor core and passive changes on other components but the TLB
depend on whether an interrupt is raised in the core or not.
• jisr(core, I, eev, pff, pfls):
In this case the core switches to system mode as a result of the step. Since in both
machines the cores are equal and we have proven that the instruction as well as the
page fault signals from the TLBs are the same, then
il(corer, I, eev, pff, pfls) = il(core, I, eev, pff, pfls).
Hence, the cores handle the same interrupt and are coupled after the step. The mem-
ory is not changed. As for the store buffers, in case of the interrupt the transition
function of MIPS-86 is only defined when sbr = sb = ε. This obviously concludes the
claim for simrMIPS(c′, c′r, icm).
• ¬jisr(core, I, eev, pff, pfls):
In the absence of interrupts in the reference machine, the reduced machine performs
the same step. If the instruction I does not interfere with the memory or store buffer,
only the core configurations are changed and equal after the transition. The instruc-
tions flush and invlpg were covered in the previous case because they cause the
illegal instruction interrupt in user mode. The instruction mfence can be performed
if the store buffers are empty, and, therefore, does not change anything except the
program counter. For any instruction I with mem(I) we have already proven above
that it accesses a word in the coupled guest memory, and, as a result, doest not de-
stroy the simulation relation for the hypervisor portion of memory. We conclude
that all components in both MIPS-86 machines are changed in the same way, what,
in turn, preserves simrMIPS(c′, c′r, icm).
Now, to establish the concurrent simulation for any unit of the Cosmos models instantiated
with the reduced and reference MIPS-86, we properly instantiate the sequential simulation
framework RSrMIPSSMIPS ∈ R. Since there is no specific simulation parameter, for simplicity we
drop the parameter equal to ⊥ in all predicates. So, for c, cr ∈ Cproc × Cm, cpu, cpur ∈ Cproc,
in, inr ∈ ΣMIPS where the subscript r indicates the reduced machine, and icm ∈ 2B32 , we have
RSrMIPSSMIPS .

P = ⊥
sim(c, cr, icm) = icm ⊂ Ahyp ∧ simrMIPS(c, cr, icm) ∧ invsb(c.cpu, icm)
sc(cr, inr) = scrMIPS(cr, inr)
X = 1 for X ∈ {CPa(cpur), CPc(cpu), wfa(cr),
wfc(c), suit(in),wb(c, in)}
Since the theorem just proven above fully corresponds to the statement of Theorem 2.3 for
RSrMIPSSMIPS , and we consider the simulation for every step of the reference machine such that an
empty step of the reduced machine is treated as an empty consistency block for the Cosmos
machine transition, it is easy to show that using Theorem 4.1 one also proves Theorem 2.3 for
our case. For brevity, we leave this bookkeeping here as a trivial exercise.
90
4.4 Store Buffer Reduction Correctness
4.4.2 SB Reduction for the Multi-Core MIPS-86
Finally, to prove store buffer reduction for the multi-core MIPS-86 machine in the presence of
the ownership model, we define the invariants needed according to Chapter 2 and show that
the required assumptions hold.
Definition 4.7 (Shared Invariant for Multi-Core MIPS-86 SB Reduction). For the Cosmos mod-
els instantiated with the reference and reduced MIPS-86 machines we demand the equality of
their sets of shared S, Sr, read-onlyR,Rr, and owned O(p), Or(p) by each unit p addresses, as
well as the contents of shared and read-only memories m, mr.
sinvSrMIPSSMIPS ((m,S,R,O), (mr,Sr,Rr,Or))
def≡
(i) S = Sr
(ii) R = Rr
(iii) ∀p ∈ Nnu . O(p) = Or(p)
(iv) m = mr
As we have already seen, we forbid the guest/process to modify the hypervisor/kernels’s
ownership. However, so far the hypervisor is able to acquire the guest/user addresses, what, in
term, might restrict guest/user accesses because we use the same safety policy for both kinds
of steps. Since the guest/process steps should be also safe and we would like to allow guests
and processes always to access their memory, we need to guarantee that the hypervisor/kernel
never acquires anything from Aguest.
Definition 4.8 (Restriction on the Ownership Transfer in SB Reduced Machine). For any con-
figuration E ∈ CSrMIPS of the SB reduced Cosmos machine we require that Aguest is always
treated as shared and no addresses from Aguest can be owned by any execution unit.
PSrMIPS(E)
def≡ Aguest ⊂ E.S ∧ ∀p ∈ Nnu. E.Op ∩Aguest = ∅
Note, that with this setting the software condition (iii) not allowing the hypervisor/kernel
to access Aguest by the instruction sw becomes redundant because the memory access policy
for our instantiated Cosmos model already covers this case. To avoid additional bookkeeping,
however, we leave the software conditions unmodified.
From Definition 4.4 of the SB invariant being applied in the concurrent simulation we know
that the SB of a processor in system mode does not contain stores toAguest and other processors’
local addresses. Nevertheless, we need to extend it also for the Cosmos model. The SB invariant
together with its extension must guarantee that in system mode the store buffer contains only
writes local for the processor.
Definition 4.9 (Extention of the SB Invariant for the Cosmos machine). For any processor
being in system mode in a configuration cpu ∈ Cproc in the reference Cosmos machine with
shared addresses S we additionally demand that its SB does not contain pending writes to the
hypervisor/kernel shared and read-only memory.
extinvsb(cpu,S) def≡ ¬mode(cpu.core) =⇒
∀a ∈ Acode ∪Aconst ∪ (S \Aguest) . ¬sbhit(cpu.sb, a[31 : 2])
Therefore, we instantiate the concrete machine unit invariant from Chapter 2 as follows:
uinvSrMIPSSMIPS (cpu,O,S) = extinvsb(cpu,S)
Before proving the assumptions, we consider a few auxiliary lemmas.
91
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
Lemma 4.1 (Hypervisor Memory Addresses Covered by the Unit’s SB Reduction Sequential
Simulation). For any configuration E ∈ CrMIPS of the SB reduced MIPS Cosmos machine and a
unit p ∈ Nnu the set of hypervisor memory addresses without non-shared owned addresses of all units
except p is equal to the union of the hypervisor read-only, shared and owned by the unit p addresses if the
ownership configuration satisfies the invariant oinv(E) and the property PSrMIPS(E).
oinv(E) ∧ PSrMIPS(E) =⇒ Ahyp \ SO(E.G, p) = SrMIPS.R∪ (E.S \Aguest) ∪ (E.Op \ E.S)
Proof: For the proof we first apply Lemma 2.8
SrMIPS.A \ SO(E.G, p) = SrMIPS.R∪ E.S ∪ (E.Op \ E.S)
From the Cosmos machine instantiation and the partitioning of the memory addresses we
know SrMIPS.A = B32 = Aguest ∪ Ahyp and Aguest ∩ Ahyp = ∅. Moreover, PSrMIPS(E) requires
that the guests’ addresses cannot be owned. Therefore, one trivially transforms the left-hand
side
SrMIPS.A \ SO(E.G, p) = (Aguest ∪Ahyp) \ SO(E.G, p)
= Aguest ∪
(
Ahyp \ SO(E.G, p)
)
,
where Aguest ∩
(
Ahyp \ SO(E.G, p)
)
= ∅.
Since SrMIPS.R = Acode ∪ Aconst, SrMIPS.R ⊂ Ahyp, and according to PSrMIPS(E) the guest
addresses belong to the shared memory and cannot be owned, we can re-write the right-hand
side as
SrMIPS.R∪ E.S ∪ (E.Op \ E.S) = Aguest ∪ (SrMIPS.R∪ (E.S \Aguest) ∪ (E.Op \ E.S))
with Aguest ∩ (SrMIPS.R∪ (E.S \Aguest) ∪ (E.Op \ E.S)) = ∅ and easily conclude
Ahyp \ SO(E.G, p) = SrMIPS.R∪ (E.S \Aguest) ∪ (E.Op \ E.S)
Lemma 4.2 (Shared and Read-Only Memory Equality from SB Reduction Simulation Rela-
tion). Given the concurrent simulation relation for a unit p between SB reduced and reference Cosmos
machines with configurations E and D respectively, one can deduce the memory coupling present in
the shared invariant, if the corresponding ownership states are coupled, the ownership invariant of the
reduced machine and the restriction on the ownership transfer are fulfilled, and the unit invariant holds.
∀D ∈ CSMIPS , E ∈ CSrMIPS , p ∈ Nnu.
(i) csimp(D.M,E)
(ii) D.S = E.S ∧ SMIPS.R = SrMIPS.R
(iii) PSrMIPS(E) ∧ oinv(E) ∧ uinvp(D)
=⇒
E.m|E.S∪SrMIPS.R = D.m|D.S∪SMIPS.R
92
4.4 Store Buffer Reduction Correctness
Proof: Consider the SB reduction simulation relation for unit p:
csimp(D.M,E) = simp
(
D.M,E.M,SO(E.G, p))
= simrMIPS
(
(D.up, D.m), (E.up, E.m),SO(E.G, p)
)
From the definition of simrMIPS for unit p we get the relations between the memories of the
reduced and reference MIPS machines:
∀a ∈ Aguest. E.m(a) = D.m(a) (4.1)
∀a ∈ Ahyp \ SO(E.G, p). E.m(a) =
{
ms(D.up.sb,D.m)(a) : ¬mode(E.up.core)
D.m(a) : otherwise
(4.2)
Moreover, by Lemma 4.1 we know
Ahyp \ SO(E.G, p) = SrMIPS.R∪ (E.S \Aguest) ∪ (E.Op \ E.S)
To prove the claim E.m|E.S∪SrMIPS.R = D.m|D.S∪SMIPS.R we distinguish cases depending on
the processor mode.
1. Translated mode: mode(E.up.core).
Since the invariant PSrMIPS(E) requires Aguest ⊂ E.S, we get from equations (4.1) - (4.2)
∀a ∈ SrMIPS.R∪ E.S. E.m(a) = D.m(a)
Because we have D.S = E.S ∧ SMIPS.R = SrMIPS.R, the goal to be proven transforms to
E.m|E.S∪SrMIPS.R = D.m|E.S∪SrMIPS.R
This is exactly what we have just shown above.
2. System mode: ¬mode(E.up.core).
First, we concentrate on the read-only and shared hypervisor addresses. Unfolding the
definition of the memory system ms in the equation (4.2) we get
∀a ∈ SrMIPS.R∪ (E.S \Aguest) . E.m(a) =
{
sbv(D.up.sb, a) : sbhit(D.up.sb, a[31 : 2])
D.m(a) : otherwise
Since the processor of the reference machine is also in system mode (the cores are coupled
via csimp(D.M,E)), from uinvp(D) and the lemma hypothesis (ii) we conclude
∀a ∈ SrMIPS.R∪ (E.S \Aguest) . ¬sbhit(D.up.sb, a[31 : 2])
Therefore, the memory simulation for these addresses boils down to
∀a ∈ SrMIPS.R∪ (E.S \Aguest) . E.m(a) = D.m(a)
Using (4.1), the fact that the guest memory addresses are included into the shared ones,
and the same argumentation about the ownership state as in the previous case, we easily
conclude the claim of the lemma.
93
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
Now we prove a lemma corresponding to Assumption 2.1. For brevity we again skip the
simulation parameter and consider only the premises needed in our case. Moreover, in order
to avoid additional transformations later on, we state the lemma in the original notation from
Chapter 2 on the consistency blocks though for the store buffer reduction such blocks boil down
to one step transition information.
Lemma 4.3 (Safety Transfer and Invariants Preservation for Store Buffer Reduction).
∀D ∈ CSMIPS , d′ ∈MSMIPS , E,E′ ∈ CSrMIPS , σ ∈ Θ∗SMIPS , τ ∈ Θ∗SrMIPS , oτ ∈ Ω∗SrMIPS , p ∈ Nnu.
(i) D.M σ7−→ d′ ∧ blk(σ, p) ∧ oneIO(σ, τ) ∧ uinvp(D)
(ii) E
〈τ,oτ 〉7−→ E′ ∧ blk(τ, p) ∧ PSrMIPS(E) ∧ safePSrMIPS (E, 〈τ, oτ 〉)
(iii) csimp(D.M,E) ∧ sinv(D,E) ∧ csimp(d′, E′)
=⇒
∃oσ ∈ (ΩSMIPS)∗,G′ ∈ GSMIPS .
(i) D
〈σ,oσ〉7−→ (d′,G′) ∧ safe(D, 〈σ, oσ〉) ∧ uinvp((d′,G′))
(ii) sinv((d′,G′), E′)
Proof: In the lemma we have to consider either empty steps in both machines, or one step of
the reference machine because we deal with the store buffer reduction at each step of the MIPS
model execution.
In the former case, one has only to show that oinv(D.G) holds. This fact trivially follows
from safe(E, ε) and sinv(D,E). Note, that oinv(D.G) is also a part of the ownership safety for a
non-empty step.
For the latter case, in Theorem 4.1 we have already seen that either a non-empty step or an
empty step of the reduced machine corresponds to one step of the reference machine . The same
is also easy to see from the lemma premises when we unfold the definitions of blk and oneIO.
Formally, we have |σ| = |τ | = 1 ∧ σ1.s = τ1.s = p or |σ| = 1 ∧ σ1.s = p ∧ τ = oτ = ε. Moreover,
the reduced machine makes an empty step only in case of a store buffer step of the processor in
system mode in the reference machine. This also follows from the consideration of steps leading
to the configurations d′ and E′ with csimp(d′, E′).
To prove the lemma we make a case split on the step of the reference machine. Namely, we
analyze whether it is suitable for the ownership transfer and an IO-operation
Case 1: ¬σ1.ot – The reference machine performs a step that cannot be used for the ownership
transfer.
For this case the reduced machine stays in the same configuration or makes a non-empty step.
Therefore, using oneIO(σ, τ) and the definition of the sequence 〈τ, oτ 〉we have oτ = ε or |oτ | =
1 ∧ ¬τ1.ot. For the non-empty ownership transfer information sequence we easily get oτ 1 =
(∅, ∅, ∅) because of the safety policy safePSrMIPS (E, 〈τ, oτ 〉). Therefore, for the implementation
step we also set oσ1 = (∅, ∅, ∅), which, in turn, does not violate the ownership transfer policy
policy trans(σ1.ot,D.Op, D.S, D.Op, oσ1) with D.Op ≡
⋃
q 6=pD.Oq and preserves the reference
machine ownership state and the parts (i) - (iii) of the shared invariant after the step.
Assuming that the unit invariant uinvp((d′,G′)) also holds after the reference machine step,
we can apply Lemma 4.2 for the machine configurations (d′,G′) and E′ and get the last part (iv)
of the shared invariant sinv((d′,G′), E′).
Therefore, the claims left to be proven for this case are
94
4.4 Store Buffer Reduction Correctness
• the transfer of the memory access policy
policyacc(σ1.io, readsp(D,σ1.in), writesp(D,σ1.in), D.Op, D.S, SMIPS.R, D.Op),
• and the unit invariant uinvp((d′,G′)) after the step. Note that from the instantiation we
have uinvp((d′,G′)) = extinvsb(d′.u(p),G′.S).
For this we look closer at the operations matching ¬σ1.ot.
Case 1.1: ¬σ1.io – The step is not matching an IO-operation. From the definition of IOMIPS we
know that in this case the processor can only be in system mode and we have to consider either
a store buffer step, or any core step except execution of the instructions lw(I), locksw(I), and
cas(I).
1. Store buffer step: in = sb
As we already know, the reduced machine makes an empty step in this case. Therefore,
we cannot directly transfer the memory access policy from safePSrMIPS (E, 〈τ, oτ 〉). How-
ever, using the invariant invsb(D.up,SO(E.G, p))1, the ownership coupling in sinv(D,E)
as well as the unit invariant uinvp(D) we conclude
∀a ∈ SMIPS.R∪D.S ∪ SO(D.G, p). ¬sbhit(D.up.sb, a[31 : 2])
Hence, since oinv(D.G) also holds, we get
∀a ∈ B32. sbhit(D.up.sb, a[31 : 2]) =⇒ a ∈ D.Op \D.S
For the store buffer step, the set readsp(D,σ1.in) is empty, and writesp(D,σ1.in) contains
the byte addresses of the pending in the SB writes, what showswritesp(D,σ1.in) ⊆ D.Op\
D.S and proves the transfer of the memory access policy.
Moreover, it is clear that the invariant extinvsb(d′.u(p),G′.S) is not violated because the
SB just commits one pending store.
2. Core step: in = (core, wI , wD, eev)
In case of an external interrupt or the instruction misalignment no memory access is per-
formed and the store buffer is unchanged. So, the memory access policy and the unit
invariant after the step trivially hold.
Otherwise, the processors of both machines fetch the same instruction at the same address,
which, in turn, implies that the set of accessed addresses AF ((D.up, D.m), in) follow the
memory access policy due to safePSrMIPS (E, 〈τ, oτ 〉) and the coupling of the ownership
states of both machines.
If the instruction does not access the memory, or we have sw and data misalignment, the
write-set is empty and we obviously conclude the claim of the lemma.
Otherwise, we consider the instruction sw. The processor of the reduced machine writes
to the memory E.m. The reference machine puts the data into its store buffer. Both oper-
ations are performed at the addresses
writescore((E.up, E.m), in) = writescore((D.up, D.m), in)
belonging to E.Op \E.S and are safe because safePSrMIPS (E, 〈τ, oτ 〉) holds and the owner-
ship states are coupled.
Moreover, the unit invariant is also preserved.
1It is present in csimp(D.M,E) according to the instantiation of the simulation framework R
SrMIPS
SMIPS
.
95
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
Case 1.2: σ1.io – The step is matching an IO-operation. The situation corresponds either to a
guest step, or execution of lw(I) in system mode.
In case of lw, instead of the writes-sets, which are empty for this instruction, we consider the
reads-sets and the proof is similar to the step with sw. Moreover, the instruction lw does not
change the SB configuration.
For any guest step, both machines perform the same operation because their configurations
are equal. Since the memory access policy holds for the reduced machine and the ownership
states are coupled, the access policy of the reference machine is also safe. If the machines stay in
translated mode after the step, the unit invariant uinvp((d′,G′)) is true by definition. Otherwise,
after the guest step the store buffer of the non-reduced machine in system mode is empty and
this invariant is also preserved.
Case 2: σ1.ot – The reference machine performs a step used for the ownership transfer.
This case corresponds to the execution of the instructions locksw(I) and cas(I) by the refer-
ence machine core in system mode. The reduced machine performs the same step updating the
memory.
Since safePSrMIPS (E, 〈τ, oτ 〉) holds and the ownership states of both machines are equal, we
simply copy the ownership transfer information oσ1 = oτ 1 which fulfills the ownership transfer
policy for the reference machine and preserves the parts (i) - (iii) of the shared invariant after
the step.
It is also easy to show as in the previous cases that the memory access policy is not violated
because both machines read the instruction and write the data at the same addresses. Moreover,
the unit invariant holds in the configuration after the step where the store buffer remain empty.
Again, using the Lemma 4.2 for the machine configurations (d′,G′) and E′, we get the last part
(iv) of the shared invariant.
Since there is no specific well-formedness in our models, Assumptions 2.2, 2.3 hold by default
and we continue with a lemma for the proof of Assumption 2.4.
Lemma 4.4 (Preservation of Simulation Relation and Unit’s Invariant for SB Reduction).
∀D,D′ ∈ CSMIPS , E,E′ ∈ CSrMIPS , p ∈ Nnu.
(i) csimp(D.M,E) ∧ uinvp(D)
(ii) sinv(D,E) ∧ PSrMIPS(E) ∧ oinv(E) ∧ oinv(D)
(iii) sinv(D′, E′) ∧ PSrMIPS(E′) ∧ oinv(E′) ∧ oinv(D′)
(iv) E ≈p E′ ∧D ≈p D′
=⇒
csimp(D
′.M,E′) ∧ uinvp(D′)
Proof: According to our instantiation we have to prove the sequential simulation relation
simrMIPS
(
(D′.up, D′.m), (E′.up, E′.m),SO(E′.G, p)
)
,
the store buffer invariant invsb(D′.up,SO(E′.G, p)) with SO(E′.G, p) ⊂ Ahyp, and the unit in-
variant uinvp(D′).
96
4.4 Store Buffer Reduction Correctness
From csimp(D.M,E) we have
E.up.core = D.up.core (4.3)
E.up.tlb = D.up.tlb (4.4)
E.up.sb =
{
ε : ¬mode(E.up.core)
D.up.sb : otherwise
(4.5)
Since E ≈p E′ and D ≈p D′ hold, we get D′.up = D.up, E′.up = E.up. Therefore, the
equations (4.3) - (4.5) are also valid for configurations E′ and D′.
The shared invariant sinv(D′, E′) gives us the coupling of the read-only and shared memories
as well as the ownership state in E′ and D′, what, in turn, implies
E′.m|E′.S∪SrMIPS.R = D′.m|E′.S∪SrMIPS.R (4.6)
Having Aguest ⊂ E′.S in PSrMIPS(E′), we also get Aguest ⊂ D′.S and, therefore,
∀a ∈ Aguest. E′.m(a) = D′.m(a).
Now, we concentrate on the rest of the simulation relation, namely,
∀a ∈ Ahyp \ SO(E′.G, p). E′.m(a) =
{
ms(D′.up.sb,D′.m)(a) : ¬mode(E′.up.core)
D′.m(a) : otherwise
,
where because of oinv(E′) and PSrMIPS(E′) the set of addresses covered by the relation is com-
puted by Lemma 4.1 as
Ahyp \ SO(E′.G, p) = SrMIPS.R∪ (E′.S \Aguest) ∪ (E′.Op \ E′.S) .
To prove this we split cases on the processor mode in E′.
1. User mode: mode(E′.up.core)
From equation (4.6) we easily conclude
∀a ∈ SrMIPS.R∪ (E′.S \Aguest) . E′.m(a) = D′.m(a)
and we only need to prove ∀a ∈ E′.Op \ E′.S. E′.m(a) = D′.m(a).
The processor core in configuration E is also in user mode because of the local configura-
tion equality E ≈p E′. Therefore, the simulation relation csimp(D.M,E) and the invari-
ants PSrMIPS(E), oinv(E) give us
∀a ∈ E.Op \ E.S. E.m(a) = D.m(a).
By E ≈p E′ and D ≈p D′ we know E.m|E.Op = E′.m|E.Op and D.m|D.Op = D′.m|D.Op .
Moreover, the ownership states are coupled as D.Op = E.Op in sinv(D,E). Therefore,
one concludes
∀a ∈ E.Op \ E.S. E′.m(a) = D′.m(a).
Since we have E′.Op ∩ E′.S = E.Op ∩ E.S and E.Op = E′.Op from E ≈p E′, we can
trivially show that the owned non-shared addresses do not change
E′.Op \ E′.S = E′.Op \ (E′.Op ∩ E′.S)
= E.Op \ (E.Op ∩ E.S)
= E.Op \ E.S
97
4 Store Buffer Reduced MIPS-86 in the Context of Hypervisor/OS Kernels
and conclude the claim ∀a ∈ E′.Op \ E′.S. E′.m(a) = D′.m(a).
Obviously, the invariants to be shown are preserved by definition because the processor
is in user mode. Using PSrMIPS(E′) and oinv(E′) we also have SO(E′.G, p) ⊂ Ahyp.
2. System mode: ¬mode(E′.up.core)
In this case we first prove the SB invariants and then consider the simulation relation.
Using the invariant invsb(D.up,SO(E.G, p)), the ownership coupling in sinv(D,E), as
well as the unit invariant uinvp(D) we conclude
∀a ∈ SMIPS.R∪D.S ∪ SO(D.G, p)). ¬sbhit(D.up.sb, a[31 : 2]) (4.7)
Therefore, since the ownership invariant oinv(D.G) holds, we can obviously state
∀a ∈ B32. sbhit(D.up.sb, a[31 : 2]) =⇒ a ∈ D.Op \D.S.
Analogously to the previous case we apply D ≈p D′ to get D.Op \ D.S = D′.Op \
D′.S. Moreover, the store buffer configuration is unchanged, namely, D.up.sb = D′.up.sb.
Hence, the following is also valid in the configuration D′:
∀a ∈ B32. sbhit(D′.up.sb, a[31 : 2]) =⇒ a ∈ D′.Op \D′.S.
Applying oinv(D′) we easily conclude
∀a ∈ SMIPS.R∪D′.S ∪ SO(D′.G, p)). ¬sbhit(D′.up.sb, a[31 : 2]),
what, in turn, by reason of PSrMIPS(E′) and sinv(E′, D′) shows invsb(D′.up,SO(E′.G, p))
and extinvsb(D′.up, D′.S). Recall that we have uinvp(D′) = extinvsb(D′.up, D′.S) by the
instantiation.
Since the sets of addresses are equal in both machines because of sinv(D′, E′), the rest of
the simulation relation to be proven boils down to the claims:
∀a ∈ SrMIPS.R∪ (E′.S \Aguest) . E′.m(a) = D′.m(a) (4.8)
∀a ∈ E′.Op \ E′.S. E′.m(a) = ms(D′.up.sb,D′.m)(a) (4.9)
The equation (4.8) follows directly from sinv(D′, E′).
From (4.7), sinv(D,E), and csimp(D.M,E) we analogously get
∀a ∈ E.Op \ E.S. E.m(a) = ms(D.up.sb,D.m)(a).
Since we have sinv(D,E) and the equalities E.Op \ E.S = E′.Op \ E′.S, E.m|E.Op =
E′.m|E.Op , D.m|D.Op = D′.m|D.Op , D′.up = D.up, we easily conclude (4.9).
This lemma finished the proof of the simple store buffer reduction applied in the thesis. From
now we can directly rely on Theorem 2.4 between Cosmos machines instantiated with the re-
duced and reference multi-core MIPS-86 models and will operate with the reduced model when
we consider the upper layers of the verification stack.
98
5
Concurrent Mixed Machine
Semantics for MIPS-86
In this chapter we consider a concurrent operational semantics for the combination of stack-
based programming languages C-IL and Macro assembly on the multi-core MIPS-86 model
from Chapter 3. Though the work is mostly based on the results from three doctoral the-
sise [Sha12, Sch13, Bau14b], we revise the semantics in detail, simplify and adapt it so that
it can be easily used for the whole model stack covered in this thesis. The semantics covered
here is suitable for implementation and verification of applications and systems where the con-
text switch, stack substitution and operations on TLBs are not required. Later after stating the
corresponding compiler correctness and concurrent simulation we will extend it to the one per-
mitting such operations.
The C intermediate language (C-IL) was introduced by Sabine Schmaltz [Sch13] as an attempt
to create a simpler language independent from the underlying architecture and compiler and
suitable for the implementation of any standard-conforming C. In comparison to the C-dialect
C0 [Lei08, PBLS15] the semantics of C-IL supports pointer arithmetic and does not have an
explicit notion of a heap, what, in turn, gives the full control over the memory and allows to
implement memory management on the low system level.
Since the pure C-IL semantics does not support operations for the implementation of the con-
text switch usually performed by operating system kernels and hypervisors, Andrey Shadrin
in his work [Sha12] formulated a macro assembly (MASM) semantics and integrated it with
the C-IL on the base of the compiler calling convention. In this mixed programming model the
implementation and verification of the context switch in a simple hypervisor on a single-core
VAMP architecture became feasible, however, only in case of empty stacks.
Later, both C-IL and MASM semantics separately were slightly adapted for a simplified ver-
sion of multi-core MIPS-86 model and considered in the concurrent setting by Christoph Bau-
mann in his doctoral thesis [Bau14b].
5.1 Sequential Macro Assembly Semantics
5.1.1 Instructions and Programs
First, by IASM we denote all MIPS-86 assembly instructions listed in Tables 3.1– 3.3 and cor-
responding to their syntax. The indices of registers and immediate constants can be repre-
sented in decimal notation. Additionally, we allow to provide the immediate constant imm
and the instruction index iindex in binary representation with the help of the prefix 0b, e.g.,
addi 5 7 0bimm.
In order to convert any MIPS-86 assembly instruction from the set IASM into its 32-bit repre-
sentations, we introduce the function
codeASM : IASM → B32
99
5 Concurrent Mixed Machine Semantics for MIPS-86
Definition 5.1 (MIPS-86 Assembly Instructions for MASM). Now, we can define the set of
MIPS-86 assembly instructions supported by the MASM semantics:
IMASMASM
def≡ {instr | instr ∈ IASM ∧ plain (codeASM(instr))}
where the predicate
plain(I)
def≡ alu(I) ∨ su(I) ∨mem(I) ∨movg2s(I) ∨movs2g(I)
denotes that the MIPS-86 instruction I ∈ B32 is not an instruction changing the control flow or
operating directly on the store buffer or the TLB.
Definition 5.2 (MASM Instructions). Let Pname be a set of procedure names admissible in macro
assembly. Then we define the set SMASM of MASM instructions in the following way:
• stack operations: r ∈ N0 =⇒ (push r) ∈ SMASM ∧ (pop r) ∈ SMASM
• parameter load/store: i ∈ N∧ r ∈ N0 =⇒ (lparam r i) ∈ SMASM ∧ (sparam r i) ∈ SMASM
• goto: l ∈ N =⇒ (goto l) ∈ SMASM
• conditional goto: r ∈ N0 ∧ l ∈ N =⇒ (ifnez r goto l) ∈ SMASM
• procedure call: pn ∈ Pname =⇒ (call pn) ∈ SMASM
• return from a procedure: ret ∈ SMASM
• assembly instructions: IMASMASM ⊂ SMASM
• inline MIPS-86 assembly1: il ∈ I+ASM =⇒ asm{il} ∈ SMASM
Definition 5.3 (MASM Program). A MASM program is represented by a procedure table de-
fined by the type ProgMASM as a mapping from names of all procedures declared in the MASM
program to the relevant procedure information:
ProgMASM
def≡ Pname ⇀ ProcT
where
ProcT
def≡ (npar ∈ N0, body ∈ S∗MASM ∪ {extern}, uses ∈ N∗)
is a set of procedure table entries with the following components:
• npar – the number of input parameters for the procedure,
• body – either the procedure body given as a list of MASM instructions, or the keyword
extern indicating that the procedure is declared as an external one.2,
• uses – a list of indices for GPRs whose content is saved on the procedure entry and restored
on the return from this procedure.
Definition 5.4 (MASM External Procedure Predicate). Given a MASM program piµ ∈ ProgMASM,
we test whether a procedure pn ∈ Pname declared in piµ (i.e. pn ∈ dom (piµ)) is external or not
with the help of the predicate
ext(pn, piµ)
def≡ piµ(pn).body = extern
1The semantics of inline assembly is not covered in this chapter and will be considered later in the thesis. However, in
order to use the same definition of MASM programs later on when we deal with this extension in detail, we include
the instruction asm{il} into SMASM already here.
2Analogously to [Sha12] we do not formalize the execution of external procedures in the MASM semantics. Such
procedures will be discussed later when we consider inter-language calls in the mixed machine semantics.
100
5.1 Sequential Macro Assembly Semantics
5.1.2 Machine Configuration
As we already mentioned, the semantics of macro assembly, like many other programming lan-
guages, does not consider the address translation and interrupt mechanism. Moreover, the store
buffer is also supposed to be invisible. Basically, one can consider the state of the MASM ma-
chine as a combination of a simplified version of the single core MIPS-86 with a stack abstraction
modelled as a sequence of procedure frames.
Definition 5.5 (MASM Stack Frame). We define the MASM stack frames as tuples from the set
frameMASM
def≡ (p ∈ Pname , loc ∈ N, pars ∈ (B32)∗, lifo ∈ (B32)∗, saved : B5 ⇀ B32)
comprising the following components:
• p – the name of the procedure the stack frame corresponds to,
• loc – the location counter denoting the index of the next instruction to be executed in the
body of the MASM procedure,
• pars – a list of input parameters passed to the procedure,
• lifo – the actual stack component used for storing temporary data and passing input pa-
rameters to procedures called from p,
• saved – a buffer for keeping the content of registers listed in the uses component of the
procedure declaration.
Note that in the original MASM semantics from [Sha12] the stack could also be not set up
and, therefore, not present. In this case it was modelled as a pair of the procedure name and
the location counter. The semantics allowed to switch to the abstract stack with frames by
writing initially specified stack frame base and stack pointers or to drop the stack abstraction
by rewriting these pointers when the stack was completely empty. Since later in this thesis we
provide a more powerful mechanism for stack substitution including the trivial cases treated
in [Sha12], we consider here a shorter version of MASM where the stack is always present.
Given the abstraction of the stack frames, we now are able to define the state of the MASM
machine.
Definition 5.6 (Sequential MASM Configuration). The MASM configuration is a tuple
CMASM
def≡ (stack ∈ frame∗MASM, gpr : B5 ⇀ B32, spr : B5 → B32, M : B32 → B8)
where stack is the MASM stack with the top-most frame at the end of the sequence, gpr and
spr are general and special purpose registers respectively, andM is the global byte-addressable
memory.
In order to easier refer the stack and program components, we use a few overloaded short-
hands. Given any MASM configuration cµ ∈ CMASM, stack st ≡ cµ.stack, and program piµ ∈
ProgMASM, we introduce the following notation for:
• the index of the top-most frame:
top(cµ) ≡ top(st) ≡ |st|
• the components X ∈ {p, loc, pars, saved , lifo} of the stack frame with an index i ∈ Ntop(st):
Xi(cµ) ≡ Xi(st) ≡ st[i].X
101
5 Concurrent Mixed Machine Semantics for MIPS-86
index i shorthand for i and i5 description
0 zero always contains 032
1 t1 temporary values
2 rv procedure return value
3 t2 temporary values
4, . . . , 7 i1 . . . i4 input arguments for procedure calls
8 . . . 15, 24 . . . 28 t3 . . . t15 temporary values
16 . . . 23 sv1 . . . sv8 callee-save registers
29 sp stack pointer
30 bp stack frame base pointer
31 ra return address
Table 5.1: GPRs Usage Convention for MIPS-86
• the components X of the top-most frame:
Xtop(cµ) ≡ Xtop(st) ≡ Xtop(st)(st)
• the components Y ∈ {npar , body , uses} of the procedure table entry for the procedure
pi(st):
Yi(cµ, piµ) ≡ Yi(st, piµ) ≡ piµ(pi(st)).Y
• the components Y for the procedure of the top-most frame:
Ytop(cµ, piµ) ≡ Ytop(st, piµ) ≡ Ytop(st)(st, piµ)
• the next MASM instruction to be executed:
instrnext(cµ, piµ)
def≡ body top(cµ, piµ)[loctop(cµ)]
5.1.3 Compiler Calling Convention for MIPS-86
In general, a calling convention describes certain rules for the low-level implementation of the
interaction between a caller and a callee. These rules are followed by compilers and also espe-
cially important when one has to support calls between functions/procedures written in differ-
ent programming languages.
In contrast to the programming in higher-level languages, which compilers usually allow
programmers to abstract from concrete implementations, the macro assembly programming
assumes the knowledge about the mechanism of passing the input parameters and returning
the result. This is directly required by the MASM semantics. Moreover, one has to know which
registers may be accessed and which must be preserved by the programmer. Hence, we discuss
these rules already in the MASM semantics before we consider the compiler correctness.
Obviously, one has to agree how the general purpose registers could be used by the compiler
and the programmer. In this thesis we rely on the usage of the GPRs (Table 5.1) borrowed
from [Bau14b]. This setting is based on the MIPS calling convention applied in [Inc06] and
differs from the original version introduced in [Sha12] for the VAMP processor.
In particular, additionally to the stack and base pointers a register containing the return ad-
dress is introduced. These three registers are relevant for the implementation of the semantics
in MIPS-86 assembly by the compiler, and should not be accessed by the MASM programmer.
Furthermore, we explicitly distinguish between so called calle- and caller-save registers.
102
5.1 Sequential Macro Assembly Semantics
Caller-save (or volatile, scratch) registers are registers that can be used by the callee without
any restrictions. The caller must save their content if it needs this content after return from the
callee.
Callee-save (or non-volatile) registers are registers whose content must be saved by the callee
before it overwrites them. The caller expects the calle-safe registers to be restored after return
from the callee.
Finally, combining the specifications from [Sha12, SS12, Bau14b], we state the rules of the
compiler calling convention for the MIPS-86:
1. The first four input parameters are passed through the registers i1 . . . i4 before the call
instruction is executed. The caller does not expect the same register values after the callee
returns.
2. The remaining parameters (if existent) are passed on the stack in right-to-left order before
the call instruction is executed. There is also a space (so called home addresses3) reserved
on the stack for parameters passed in registers.
3. The return value from a procedure call is passed through the register rv .
4. All callee-save registers must be restored before return.
5. A callee is responsible for cleaning up the stack from the parameters.
Note that as follows from Table 5.1 and the discussion above, we refer the registers t1 . . . t15,
i1 . . . i4, and rv as caller-save registers available for the MASM programmer. The sets of indices
for calle- and caller-save registers Regcallee , Regcaller ⊂ B5 respectively can then be formally
defined as
Regcallee
def≡ {svi | svi ∈ B5 ∧ i ∈ N8}
Regcaller
def≡ B5 \ {zero, sp, bp, ra} \Regcallee
Definition 5.7 (Number of Parameters Passed on the MASM Stack and through Registers).
In order to compute the number of parameters to be passed through registers for a call of a
procedure p ∈ Pname declared in a MASM program piµ ∈ ProgMASM, we define the auxiliary
function
nparpiµregs(p)
def≡
{
4 : piµ(p).npar > 4
piµ(p).npar : otherwise
Hence, the number of parameters to be passed on the stack is computed as
npar
piµ
stack (p)
def≡ piµ(p).npar − nparpiµregs(p)
5.1.4 Transition Function
First, we introduce functions preforming specific updates on the MASM configuration.
Definition 5.8 (MASM Configuration Update Functions). For a given MASM configuration
cµ ∈ CMASM with the stack cµ.stack 6= ε, we define the following operations on the top-most
frame t ≡ top(cµ) of the stack:
3According to Microsoft [Mic15], “Home addresses are required for the register arguments so a contiguous area is
available in case the called function needs to take the address of the argument list ... or an individual argument.
This area also provides a convenient place to save register arguments ... and may be used by the called function for
other purposes besides saving parameter register values.”
103
5 Concurrent Mixed Machine Semantics for MIPS-86
• incrementing the location counter4:
incloc(cµ)
def≡ cµ
[
stack := cµ.stack[t 7→ frame ′]
]
with frame ′ ≡ cµ.stack[t] [loc := loctop(cµ) + 1]
• setting the location counter to l ∈ N:
set loc(cµ, l)
def≡ cµ
[
stack := cµ.stack[t 7→ frame ′]
]
with frame ′ ≡ cµ.stack[t] [loc := l]
• removing the top-most frame:
dropframe(cµ)
def≡ cµ [stack := cµ.stack[1 : t− 1]]
• pushing a sequence el ∈ (B32)∗ of elements on the lifo:
push lifo(cµ, el)
def≡ cµ
[
stack := cµ.stack[t 7→ frame ′]
]
with frame ′ ≡ cµ.stack[t]
[
lifo := lifotop(cµ) ◦ el
]
• removing n ∈ N elements from the lifo if |lifotop(cµ)| ≥ n:
poplifo(cµ, n)
def≡ cµ
[
stack := cµ.stack[t 7→ frame ′]
]
with frame ′ ≡ cµ.stack[t]
[
lifo := lifotop(cµ)[1 : |lifotop(cµ)| − n]
]
• updating a parameter i ∈ N on the stack by a value v ∈ B32:
setpars(cµ, i, v)
def≡ cµ
[
stack := cµ.stack[t 7→ frame ′]
]
with frame ′ ≡ cµ.stack[t] [pars := pars ′] and pars ′ ≡ parstop(cµ) [i 7→ v]
Obviously, in the definition of the MASM semantics we will be interested only in valid, or, in
other words, well-formed states of the MASM machine.
Definition 5.9 (Well-Formed MASM Stack). We call a MASM stack st ∈ frame∗MASM well-formed
wrt. a MASM program piµ ∈ ProgMASM if and only if (i) the stack is not empty and (ii) every
stack frame (a) belongs to a procedure defined in the program piµ, (b) has the location counter
pointing to an instruction in the body of this procedure, and exactly the declared (c) number of
parameters and (d) the used registers are stored in the corresponding components of the frame.
wfstack
piµ
MASM(st)
def≡ (i) st 6= ε
(ii) ∀ i ∈ Ntop(st).
(a) pi(st) ∈ dom (piµ) ∧ ¬ext(pi(st), piµ)
(b) loci(st) ∈ [1 : |body i(st, piµ)|]
(c) |parsi(st)| = npar i(st, piµ)
(d) r ∈ dom (saved i(st))⇐⇒ 〈r〉 ∈ usesi(st, piµ)
4One could have written the definition as: incloc(cµ)
def≡ c′µ such that loctop(c′µ) = loctop(cµ) + 1 and all other
components of c′µ are equal to the ones from cµ. Instead, we present it in detail in the formal way as it is done
in [Sha12], what better matches the computer-aided verification, e.g., in case of translating the definition into a tool
specific language.
104
5.1 Sequential Macro Assembly Semantics
Moreover, as we have seen in the definition of the sequential MASM configurations, the gen-
eral purpose registers are represented by a partial mapping. Now, after we have considered
the GPRs usage convention, we make the following restriction on this MASM configuration
component.
Definition 5.10 (Well-Formed MASM GPRs). We say that the mapping gpr : B5 ⇀ B32 repre-
senting the general purpose registers in the MASM configuration is well-formed iff the GPRs
contain all registers except sp, bp, ra which content is not modelled in the semantics.
wfregMASM(gpr)
def≡ dom (gpr) = B5 \ {sp, bp, ra}
Definition 5.11 (Well-Formed MASM Configuration). Hence, a sequential MASM configura-
tion cµ ∈ CMASM is considered to be well-formed wrt. a MASM program piµ ∈ ProgMASM if and
only if its stack is well-formed and the GPR component represents all registers except sp, bp, ra.
wfconf
piµ
MASM(cµ)
def≡ (i) wfstackpiµMASM(cµ.stack)
(ii) wfregMASM(cµ.gpr)
Then, the operational semantics of the sequential MASM machine can be formalized in a way
similar to [Sha12].
Definition 5.12 (Sequential MASM Transition Function). For a given MASM program piµ ∈
ProgMASM, we define the transitions of the sequential MASM machine by the function
δ
piµ
MASM : CMASM → CMASM⊥
where CMASM⊥
def≡ CMASM ∪ {⊥} and ⊥ denotes here an error state.5
So, for a configuration cµ ∈ CMASM, the function δpiµMASM(cµ) is defined by a case split on the
instruction to be executed.
Macros Semantics First, we consider the macros execution:
• Push operation on stack: instrnext(cµ, piµ) = push r
δ
piµ
MASM(cµ)
def≡
{
incloc
(
push lifo(cµ, cµ.gpr(r5))
)
: r /∈ {sp, bp, ra}
⊥ : otherwise
• Pop operation on stack: instrnext(cµ, piµ) = pop r
δ
piµ
MASM(cµ)
def≡
{
incloc
(
poplifo(c
′
µ, 1)
)
: lifotop(cµ) 6= ε ∧ r /∈ {sp, bp, ra, zero}
⊥ : otherwise
where c′µ ≡ cµ
[
gpr := cµ.gpr
[
r5 7→ lifotop(cµ)[k]
]]
with k ≡ |lifotop(cµ)|.
5The error state introduced in [Lei08] is used to indicate a run-time error appearing during the program execution
(including errors that could be detected by the static check of the program). Instead of this additional state, one
could define the transition function as a partial mapping. However, such a solution would not match the simulation
theorem for Cosmos machines where one has to guarantee that there are no run-time errors for existing abstract
computations.
105
5 Concurrent Mixed Machine Semantics for MIPS-86
• Parameter load: instrnext(cµ, piµ) = lparam r i
δ
piµ
MASM(cµ)
def≡
{
incloc
(
c′µ
)
: i ≤ |parstop(cµ)| ∧ r /∈ {sp, bp, ra, zero}
⊥ : otherwise
where c′µ ≡ cµ
[
gpr := cµ.gpr
[
r5 7→ parstop(cµ)[i]
]]
• Parameter store: instrnext(cµ, piµ) = sparam r i
δ
piµ
MASM(cµ)
def≡
{
incloc (setpars (cµ, i, cµ.gpr(r5))) : i ≤ |parstop(cµ)| ∧ r /∈ {sp, bp, ra}
⊥ : otherwise
• Goto: instrnext(cµ, piµ) = goto l
δ
piµ
MASM(cµ)
def≡
{
set loc (cµ, l) : l ≤ |body top(cµ, piµ)|
⊥ : otherwise
• Conditional goto: instrnext(cµ, piµ) = ifnez r goto l
δ
piµ
MASM(cµ)
def≡

set loc (cµ, l) : l ≤ |body top(cµ, piµ)| ∧ r /∈ {sp, bp, ra} ∧ cµ.gpr(r5) 6= 032
incloc (cµ) : l ≤ |body top(cµ, piµ)| ∧ r /∈ {sp, bp, ra} ∧ cµ.gpr(r5) = 032
⊥ : otherwise
• Procedure call: instrnext(cµ, piµ) = call pn
The function δpiµMASM(cµ) does not produce the error state if and only if the following tran-
sition conditions hold:
– the procedure pn is declared as non-external in the program
pn ∈ dom (piµ) ∧ ¬ext(pn, piµ)
– the lifo of the top-most frame contains at least as many elements as the number of
procedure parameters to be passed on the stack:
|lifotop(cµ)| ≥ nparpiµstack (pn)
Then the transition effect is computed as
δ
piµ
MASM(cµ)
def≡ cµ
[
stack := incloc
(
poplifo
(
cµ,npar
piµ
stack (pn)
))
.stack ◦ frame ′]
where the components of the newly created stack frame are initialized as follows:
frame ′.p = pn frame ′.loc = 1 frame ′.lifo = ε
frame ′.saved(r) =
{
cµ.gpr(r) : 〈r〉 ∈ piµ(pn).uses
undefined : otherwise
frame ′.pars = pars ′ ◦ pars ′′
106
5.1 Sequential Macro Assembly Semantics
...
1
...lifotop(cμ)
j−1 j+1j k
nparstack
πμ ( pn)
parsn pars5
Figure 5.1: Passing parameters on the lifo during the MASM procedure call. Here, the number
of input parameters pars for pn is n ≡ piµ(pn).npar .
with the sequences6 pars ′ ≡ (032, . . . , 032) of length |pars ′| = nparpiµregs(pn) and pars ′′ ≡
rev
(
lifotop(cµ) [j : k]
)
for the indices k ≡ |lifotop(cµ)|, j ≡ k + 1− nparpiµstack (pn) depicted
on Figure 5.1.
If the transition conditions are not met, the result is δpiµMASM(cµ)
def≡ ⊥.
• Return from a procedure: instrnext(cµ, piµ) = ret
δ
piµ
MASM(cµ)
def≡
{
dropframe (cµ [gpr := gpr
′]) : top(cµ) > 1
⊥ : otherwise
such that
gpr′(r) =
{
saved top(cµ)(r) : 〈r〉 ∈ piµ(ptop(cµ)).uses
cµ.gpr(r) : otherwise
Assembly Instructions In case of an assembly instruction instrnext(cµ, piµ) ∈ IMASMASM , we de-
fine δpiµMASM(cµ) on the base of the transition function δinstr for the non-interrupted instruction
execution from Chapter 3 and the memory update semantics for the SB reduced MIPS-86 from
Chapter 4. For this purpose we introduce or reload the following shorthands:
• the MIPS-86 instruction to be executed:
Iµ ≡ I(cµ, piµ) def≡ codeASM(instrnext(cµ, piµ))
• the processor core configuration coreMIPS(cµ) ∈ Ccore constructed from the MASM con-
figuration cµ such that coreMIPS(cµ)
def≡ c with register files c.spr = cµ.spr, c.gpr such
that
c.gpr(r) =
{
cµ.gpr(r) : r /∈ {sp, bp, ra}
 B32 : otherwise
and any value of the program counter c.pc =  B32
• the data word read from the memory:
Rµ ≡ R(cµ, piµ) ≡ cµ.M4(ea(c, Iµ))
6In the original work [Sha12] pars′ is initialized with arbitrary values introducing the non-determinism in the MASM
semantics. As we have mentioned in Section 1.2, in this work we model systems as deterministic automata and,
therefore, in this case should have provided to the MASM transition function an extra input containing such initial
values. However, in order to make the semantics simpler and avoid this extra alphabet, we initialize pars′ with
zeros.
107
5 Concurrent Mixed Machine Semantics for MIPS-86
• the next processor core configuration: c′ ≡ δinst(c, Iµ, Rµ)
Similarly to the macros, we are not allowed to access the registers whose values are not mod-
eled in the semantics. This registers are detected by the fields of the instruction to be exe-
cuted. However, rs, rt, and rd are defined only for those instructions in which the correspond-
ing fields are present. We define in detail a predicate indicating that the assembly instruction
instrnext(cµ, piµ) ∈ IMASMASM to be executed in the MASM configuration cµ does not access the
registers sp, bp, ra and does not write to the register zero:
accregsMASMASM (cµ, piµ)
def≡ (i) rt(Iµ) /∈ {sp, bp, ra}
(ii) ¬(lui(Iµ) ∨ sll(Iµ) ∨ srl(Iµ) ∨ sra(Iµ)) =⇒
rs(Iµ) /∈ {sp, bp, ra}
(iii) gprw(Iµ) =⇒ cad(Iµ) /∈ {sp, bp, ra, zero}
Now the result of the transition can be easily computed as
δ
piµ
MASM(cµ)
def≡
{
incloc
(
c′µ
)
: accregsMASMASM (cµ, piµ)
⊥ : otherwise
where
c′µ.gpr = c
′.gpr c′µ.spr = c
′.spr c′µ.stack = cµ.stack
c′µ.M =

δm(cµ.M, (ea(c, Iµ), sv(c, Iµ))) : locksw(Iµ) ∨ sw(Iµ)
δm(cµ.M, (c.gpr(rd(Iµ)), ea(c, Iµ), sv(c, Iµ))) : cas(Iµ)
cµ.M : otherwise
Inline Assembly As we mentioned before, we will consider the extended semantics with in-
line assembly as a separate topic later in this work. Therefore, for the time being, in case of
instrnext(cµ, piµ) = asm{il}we simply generate the run-time error δpiµMASM(cµ)
def≡ ⊥.
5.2 Sequential Intermediate C Semantics
5.2.1 Types and Qualifiers
Definition 5.13 (C-IL Types). Let TC denote the set of composite (struct) types and TP be the set
of primitive types containing signed and unsigned 32-bit integers and the type void:
TP
def≡ {void, i32,u32}
Then the set of C-IL types T is constructed as follows:
• primitive types: t ∈ TP =⇒ t ∈ T
• struct types: tc ∈ TC =⇒ (struct tc) ∈ T
• regular pointer types: t ∈ T =⇒ ptr(t) ∈ T
• array types: t ∈ T ∧ n ∈ N =⇒ array(t, n) ∈ T
An array type array(t, n) is characterized by the type t of elements and the number n of
elements in the array.
108
5.2 Sequential Intermediate C Semantics
• function pointer types: t ∈ T ∧ T ∈ (T \ {void})∗ =⇒ funptr(t, T ) ∈ T
A function pointer type funptr(t, T ) is described by the type t of the return value and the
list T of parameter types.
Note that we do not explicitly provide a type for boolean values, which could be modeled by
the integer type in our case.
Besides the types, the C intermediate language supports type qualifiers const and volatile
corresponding to the ones from regular C. The qualifiers annotate the type declaration and hint
the compiler how to treat variables of such types.
Naturally, const is used to declare constant variables that cannot be re-written. In turn,
volatile indicates that the content of the corresponding memory region might be modified by
the environment, e.g., other threads, devices, or interrupt handlers. Therefore, in this case the
compiler performs less optimization and does not reorder memory accesses to such variables.
Definition 5.14 (Type Qualifiers). We denote the set of type qualifiers as
Q
def≡ {const,volatile}
Any C-IL type can be annotated with a subset of qualifiers.
Definition 5.15 (Qualified C-IL Types). We define the set of qualified types TQ as a set containing
the following elements:
• the type void cannot be qualified: (∅,void) ∈ TQ
• qualified primitive types: q ⊆ Q ∧ t ∈ {i32,u32} =⇒ (q, t) ∈ TQ
• qualified struct types: q ⊆ Q ∧ tc ∈ TC =⇒ (q, struct tc) ∈ TQ
• qualified regular pointer types: q ⊆ Q ∧ t ∈ TQ =⇒ (q,ptr(t)) ∈ TQ
• qualified array types: q ⊆ Q ∧ t ∈ TQ ∧ n ∈ N =⇒ (q,array(t, n)) ∈ TQ
• qualified function pointer types:
q ⊆ Q ∧ t ∈ TQ ∧ T ∈ (TQ \ {(∅,void)})∗ =⇒ (q, funptr(t, T )) ∈ TQ
As we mentioned before, the type qualifiers are only used by the compiler. In fact, they do not
affect the C-IL semantics and can be dropped in formal definitions for brevity whenever possi-
ble. For this purpose we introduce a function converting qualified types to the corresponding
unqualified ones.
Definition 5.16 (Conversion from Qualified Type to Unqualified Type). We define the function
qt2t : TQ → T converting any qualified C-IL type to an unqualified type as follows:
qt2t(x)
def≡

t : x = (q, t) ∧ t ∈ TP
ptr(qt2t(x′)) : x = (q,ptr(x′))
array(qt2t(x′), n) : x = (q,array(x′, n))
funptr(qt2t(x′),map(qt2t,X)) : x = (q, funptr(x′, X))
struct tc : x = (q, struct tc)
109
5 Concurrent Mixed Machine Semantics for MIPS-86
Definition 5.17 (Type Predicates). We define predicates checking whether a given type t ∈ T is
a pointer type, an array type, or a function pointer type:
isptr(t)
def≡ ∃t′. t = ptr(t′)
isarray(t)
def≡ ∃t′, n. t = array(t′, n)
isfunptr(t)
def≡ ∃t′, T. t = funptr(t′, T )
5.2.2 Values
In contrast to many programming language semantics, most values used in C-IL are modeled
as bit-strings of an appropriate length and a type.
Definition 5.18 (Primitive Values). The set valprim contains all values for variables of primitive
type.
valprim
def≡
⋃
b∈B32
{val(b, i32),val(b,u32)}
Note that we do not introduce a value for the type void because this type explicitly declares
its absence.
Since any array value in C is treated as a pointer to the first element of the array, we also
model array values as pointers with an array type. First, we introduce the values of pointers to
the global memory represented as a couple of an address (which is 32-bit string according to the
MIPS-86 ISA) and a corresponding type.
Definition 5.19 (Regular Pointer and Array Values). We define the set valptr of values for
regular pointers and arrays as
valptr
def≡
⋃
t∈T∧(isptr(t)∨isarray(t))
{val(a, t) | a ∈ B32}
Among with the regular pointer and array values we distinguish pointers to local variables
residing on the stack. Such pointer values are called local references.
Definition 5.20 (Local References). Let V be the set of all variable names. Then all local refer-
ences are defined as tuples containing a name of a local variable v, a byte offset o in the byte
representation of this variable, an index i of a stack frame corresponding to a call of a function
in whose body this variable is used, and a type t of the variable.
val lref
def≡
⋃
t∈T∧(isptr(t)∨isarray(t))
{lref((v, o), i, t) | v ∈ V ∧ o, i ∈ N0}
In the C-IL semantics we provide two kinds of function pointer values: actual function pointer
value and symbolic function values. The former one represents a memory address where the
compiled code of a function starts. We will later introduce a specific environment parameter
that depending on the compiler computes such addresses for given function names. The latter
one is used when we do not need to store function pointers in the memory, e.g., for inline or
external functions. In this case a symbolic value can only be used to perform a function call.
110
5.2 Sequential Intermediate C Semantics
Definition 5.21 (Function Pointer Values). Let Fname be a set of function names. Then the sets
val fptr and val fun of values for C-IL function pointers are defined as follows:
val fptr
def≡
⋃
t∈T∧isfunptr(t)
{val(a, t) | a ∈ B32}
val fun
def≡ {fun(f, t) | f ∈ Fname ∧ isfunptr(t)}
Definition 5.22 (C-IL Values). Now, the set of all C-IL values can be defined as a union of all
individual values considered above:
val
def≡ valprim ∪ valptr ∪ val lref ∪ val fptr ∪ val fun
5.2.3 Expressions and Statements
In order to define C-IL programs, we introduce expressions and statements supported in the
C-IL semantics.
Definition 5.23 (Unary and Binary Operators). The setsO1 andO2 of unary and binary operators
occurring in C-IL expressions are defined as follows:
O1 ⊂ { |  : val ⇀ val}
O2 ⊂ {⊗ | ⊗ : (val × val) ⇀ val}
O1
def≡ {-,∼,!}
O2
def≡ {+,-,*,/,%, <<,>>,<,>,<=, >=,==, !=,&, |, ,ˆ&&, ||}
The operators supported in C-IL fully correspond to the ones used C and we omit here their
mathematical definitions. For more details, please, refer to [Sch13]. Note that the addition
operation allows to perform the pointer arithmetic. When a primitive value is added to a regular
pointer, array value, or local reference, this primitive value is multiplied by the size of the type
pointed to.
First, additionally to the set V of variable names and the set Fname of function names present
in our semantics, we introduce F denoting the set of all field names used in composite types.
Definition 5.24 (C-IL Expressions). The set E contains all possible C-IL expressions and defined
as a set obeying the following rules:
• constants: c ∈ val =⇒ c ∈ E
• variable names: v ∈ V =⇒ v ∈ E
• function names: fn ∈ Fname =⇒ fn ∈ E
• unary operations: e ∈ E ∧  ∈ O1 =⇒ e ∈ E
• binary operations: e1, e2 ∈ E ∧ ⊗ ∈ O2 =⇒ (e1 ⊗ e2) ∈ E
• ternary operation: e, e1, e2 ∈ E =⇒ (e ? e1 : e2) ∈ E
• type cast: t ∈ TQ ∧ e ∈ E =⇒ (t)e ∈ E
• pointer dereferencing: e ∈ E =⇒ ∗(e) ∈ E
111
5 Concurrent Mixed Machine Semantics for MIPS-86
• address of: e ∈ E =⇒ &(e) ∈ E
• field access: e ∈ E ∧ f ∈ F =⇒ (e).f ∈ E
• size of a type: t ∈ TQ =⇒ sizeof(t) ∈ E
• size of an expression: e ∈ E =⇒ sizeof(e) ∈ E
The array and pointer field accesses typical for C do not belong to the C-IL expressions but
can be easily translated into them:
a[i]
def≡ ∗(a+ i) b->f def≡ (∗(b)).f
Definition 5.25 (C-IL Statements). We define the set SCIL of C-IL statements inductively as fol-
lows:
• assignment: e0, e1 ∈ E =⇒ (e0 = e1) ∈ SCIL
• goto: l ∈ N =⇒ (goto l) ∈ SCIL
• if-not-goto: l ∈ N ∧ e ∈ E =⇒ (ifnot e goto l) ∈ SCIL
• function call with a return value: e0, e ∈ E ∧ E ∈ E∗ =⇒ (e0 = call e(E)) ∈ SCIL
• function call without a return value: e ∈ E ∧ E ∈ E∗ =⇒ (call e(E)) ∈ SCIL
• return from a function: e ∈ E =⇒ (return e) ∈ SCIL and return ∈ SCIL
For both kinds of function calls, E ∈ E∗ represents a list of expressions passed to a function as
parameters.
5.2.4 C-IL Programs
All relevant information about functions of a C-IL program is represented by a function table
consisting of function table entries.
Definition 5.26 (C-IL Function Table Entries). The set FunT of function table entries corre-
sponding to each function of a C-IL program is defined as
FunT
def≡ (rettype ∈ TQ,npar ∈ N0, V ∈ (V× TQ)∗, P ∈ S∗CIL ∪ {extern})
with the following components:
• rettype – the type of the function’s return value which can be also void in case of the
absence of the return value,
• npar – the number of input parameters of the function,
• V – a list of parameter and local variable declarations consisting of pairs of a variable
name and its type. The first npar elements of the list represent the input parameters.
• P – either the function body containing a list of C-IL statements, or the keyword extern
indicating that the function is declared as an external function.7
7In contrast to [Sch13] we do not formalize generally the external function execution in the pure C-IL semantics and
will treat in the semantics only one particular intrinsic function supported by MISP-86 ISA. Later on in this thesis,
however, we will consider external functions in detail for extended models (e.g. mixed semantics) in a way similar
to [Sha12] .
112
5.2 Sequential Intermediate C Semantics
Definition 5.27 (C-IL Program). Then all C-IL programs are represented by tuples from the set
ProgCIL such that
ProgCIL
def≡ (VG ∈ (V× TQ)∗, TF : TC ⇀ (F× TQ)∗,F : Fname ⇀ FunT )
has the following elements:
• VG – a list of global variable declarations consisting of names of global variables and their
qualified types,
• TF – a type table for struct types providing for each such type a list of fields and with their
types,
• F – the function table modelled as a mapping from a function name to a corresponding
function table entry.
5.2.5 Machine Configuration
In order to define the semantics for C-IL programs, we proceed with a definition of the C-
IL machine state (or configuration). Basically, such a configuration includes the global byte-
addressable memory and an abstract stack modelled as a sequence of frames corresponding to
function calls.
Definition 5.28 (C-IL Stack Frames). C-IL stack frames are tuples from the set
frameCIL
def≡ (f ∈ Fname , rds ∈ valptr ∪ val lref ∪ {⊥}, loc ∈ N,ME : V⇀ (B8)∗)
and contain the following components:
• f – the name of the function, which the stack frame belongs to,
• rds – the return value destination describing where the return value of a function call has
to be stored on the return. The return destination can be a pointer, a local reference, or the
value ⊥ denoting the absence of the return destination.
• loc – the location counter showing the index of the next statement to be executed in the
body of the function,
• ME – the memory for local variables and parameters in the call of the function f . This
memory on the stack (or local memory) is modeled as a mapping from a name of a declared
variable to the byte-string representation of its value.
Definition 5.29 (Sequential C-IL Configuration). A C-IL configuration is a tuple
CCIL
def≡ (stack ∈ frame∗CIL,M : B32 → B8)
where stack is the C-IL stack with the top-most frame at the end of the sequence, andM is the
global byte-addressable memory.
Analogously to MASM we introduce a number of overloaded shorthands for c ∈ CCIL, st ≡
c.stack, and pi ∈ ProgCIL:
• the index of the top-most frame:
top(c) ≡ top(st) ≡ |st|
113
5 Concurrent Mixed Machine Semantics for MIPS-86
• the components X ∈ {f, rds, loc,ME} of the stack frame with an index i ∈ Ntop(st):
Xi(c) ≡ Xi(st) ≡ st[i].X
• the components X of the top-most frame:
Xtop(c) ≡ Xtop(st) ≡ Xtop(st)(st)
• the components Y ∈ {rettype,npar , V, P} of the function table entry for the function fi(st):
Yi(c, pi) ≡ Yi(st, pi) ≡ pi.F(fi(st)).Y
• the components Y for the function of the top-most frame:
Ytop(c, pi) ≡ Ytop(st, pi) ≡ Ytop(st)(st, pi)
• the next C-IL statement to be executed:
stmtnext(c, pi) ≡ Ptop(c, pi)[loctop(c)]
5.2.6 Environment Parameters
For program execution in C-IL we need additional information from the compiler and the un-
derlying architecture, e.g., the composite types layout in the memory, base addresses of global
variables, etc. In [Sch13] this information is collected in so called environment parameters because
the C-IL semantics is designed in a general way without a particular connection to a certain C
compiler and a processor architecture.
Here, we do not consider all parameters from [Sch13] and concentrate only on those required
for our MIPS-86 based version of the C-IL. Note that the endianness of the MIPS-86 architecture
is little endian and for brevity we do not include it in the environment parameters.
Definition 5.30 (C-IL Environment Parameters). We define the C-IL environment parameters
θ ∈ ParamsCIL as a tuple
ParamsCIL
def≡ (allocgvar ,Fadr , sizestruc , offsetstruc , size t, cast , intrinsics)
with the following components:
• allocgvar : V ⇀ B32 – a mapping from a name of a global variable to the variable’s base
addresses in the memory,
• Fadr : Fname ⇀ B32 – a mapping from a given C-IL function name to the staring ad-
dress of the function’s compiled code in the memory (undefined for inline and external
functions),
• sizestruc : TC ⇀ N – the size of struct types in bytes,
• offsetstruc : TC × F⇀ N0 – byte-offsets of fields in struct types,
• size t ∈ TP – the type of the value returned by the operator sizeof,
• cast : val × T ⇀ val – a function describing how the compiler handles the type casting
(an example of its in-depth definition can be found in [Sch13]). Note that C-IL is assumed
to be type-correct, i.e., all required type casts are made explicitly in the code with the help
of the corresponding type cast expression.
114
5.2 Sequential Intermediate C Semantics
• intrinsics : Fname ⇀ FunT – a function table for compiler intrinsic functions. According
to [Sch13], such functions provide accesses to architecture specific instructions and usually
inlined into the compiled code instead of performing ordinary function calls.
As it was done in [Bau14b], in this work we will consider the only intrinsic function corre-
sponding to the MIPS-86 compare-and-swap cas. Therefore, for cas ∈ Fname we introduce
θ.intrinsics(cas)
def≡ ftecas
such that the function table entry ftecas is defined as
ftecas .rettype = (∅,void)
ftecas .npar = 4
ftecas .V = (dest, (∅,ptr(∅, i32))) ◦ (cmp, (∅, i32)) ◦
(exch, (∅, i32)) ◦ (ret, (∅,ptr(∅, i32)))
ftecas .P = extern
where dest is a pointer to the memory location at which the memory content shall be swapped
to the value exch if this content is equal to the compared value cmp. Additionally, the memory
content pointed by dest is stored at the location pointed by ret.
Given these environment parameters θ ∈ ParamsCIL, we can now define a function calculat-
ing the size of a given type in bytes.
Definition 5.31 (Size of a C-IL Type). We define the function sizeθ(t) for types t ∈ T as
sizeθ(t)
def≡

4 : t ∈ {i32,u32} ∨ isptr(t) ∨ isfunptr(t)
n · sizeθ(t′) : t = array(t′, n)
θ.sizestruc(tc) : t = struct tc
Definition 5.32 (Zero Value). Moreover, having the size of a type for a given value x ∈ val , it is
easy to check whether x is considered to be zero or not:
zeroθ(x)
def≡
{
a = 08·sizeθ(t) : x = val(a, t)
undefined : otherwise
Definition 5.33 (Function Predicate). Additionally, we would like to check whether a given
function pointer v ∈ val corresponds to a given function name f ∈ Fname :
isfunc(v, f, θ)
def≡ ∃t ∈ T, a ∈ B32.
isfunptr(t) ∧ (v = val(a, t) ∧ θ.Fadr (f) = a ∨ v = fun(f, t))
Definition 5.34 (Combined Function Table). Given a C-IL program pi ∈ ProgCIL, we can define
the combined function table as
Fθpi
def≡ pi.F ∪ θ.intrinsics
Note, that the names of functions declared in the program must differ from the names of the
compiler intrinsic functions.
Definition 5.35 (C-IL External Function Predicate). In order to test whether a given function
f ∈ Fname declared in the program pi (i.e. f ∈ dom
(Fθpi)) is an external one we use the predicate
ext(f, pi, θ)
def≡ Fθpi(f).P = extern
115
5 Concurrent Mixed Machine Semantics for MIPS-86
5.2.7 Semantics of C-IL Memory Accesses
In C-IL we access the byte-addressable memory in order to read and write C-IL values. For con-
verting values between the byte-string and C-IL representations, we provide auxiliary functions
introduced in [Sch13].
Definition 5.36 (Converting Values to/from Byte-Strings). We define two partial functions
val2bytes : val ⇀ (B8)∗
bytes2val : (B8)∗ × T⇀ val
such that
val2bytes(v)
def≡
{
bits2bytes(b) : v = val(b, t)
undefined : otherwise
bytes2val(bs, t)
def≡
{
val(bytes2bits(bs), t) : t 6= struct tc
undefined : otherwise
Now we can define functions which read and write from/to the global and local memories
of the C-IL machine.
Definition 5.37 (Reading Byte-Strings from Global Memory). For a memoryM : B32 → B8 of
the C-IL machine, an address a ∈ B32, and a length s ∈ N, the function readgm(M, a, s) ∈ (B8)∗
returns a byte-string of length s computed as follows:
readgm(M, a, s) def≡
{
M(a) : s = 1
readgm(M, a+32 132, s− 1) ◦M(a) : otherwise
Definition 5.38 (Writing Byte-Strings to Global Memory). Analogously, for a memoryM and
a byte-string bs ∈ (B8)∗ to be written to the memory at a starting address a ∈ B32 we define the
function writegm(M, a, bs) : B32 → B8 updating the memory in a way such that for any byte
address x ∈ B32 we have
writegm(M, a, bs)(x) def≡
{
bs[〈x〉 − 〈a〉] : 0 ≤ 〈x〉 − 〈a〉 < |bs|
M(x) : otherwise
Definition 5.39 (Reading Byte-Strings from Local Memory). A byte-string of length s ∈ N read
from a local memoryME : V ⇀ (B8)∗ for a local variable v ∈ V starting at an offset o ∈ N0 is
computed by the partial function
readlm(ME , v, o, s) def≡ ME(v)[o+ s− 1] ◦ · · · ◦ME(v)[o]
If s + o > |ME(v)| or v /∈ dom (ME), then the function readlm(ME , v, o, s) is undefined for the
given parameters.
Definition 5.40 (Writing Byte-Strings to Local Memory). In order to write a byte-string bs ∈
(B8)∗ to a variable v ∈ V from a local memoryME : V⇀ (B8)∗ at an offset o ∈ N0, we introduce
the function writelm(ME , v, o, bs) updating the local memory in a way such that for any w ∈ V
and i < |ME(w)| one gets
writelm(ME , v, o, bs)(w)[i] def≡
{
bs[i− o] : w = v ∧ o ≤ i < o+ |bs|
ME(w)[i] : otherwise
If |bs|+ o > |ME(v)| or v /∈ dom (ME), then the function writelm(ME , v, o, bs) is undefined.
116
5.2 Sequential Intermediate C Semantics
Finally, we are ready to define similar functions on a C-IL configuration.
Definition 5.41 (Reading a Value from a C-IL Configuration). Given the environment param-
eters θ ∈ ParamsCIL, a C-IL configuration c ∈ CCIL, and a pointer value x ∈ val , the partial
function read(θ, c, x) ∈ val returns a C-IL value read from the memory pointed to:
read(θ, c, x)
def≡

bytes2val(readgm(c.M, a, sizeθ(t)), t) : x = val(a,ptr(t))
bytes2val(readlm(c.stack[i].ME , v, o, sizeθ(t)), t) : x = lref((v, o), i,ptr(t))
read(θ, c,val(a,ptr(t))) : x = val(a,array(t, n))
read(θ, c, lref((v, o), i,ptr(t))) : x = lref((v, o), i,array(t, n))
undefined : otherwise
Definition 5.42 (Writing a Value to a C-IL Configuration). In order to write a given C-IL value
y ∈ val to a C-IL configuration c ∈ CCIL at the memory pointed to by a pointer x ∈ val according
to the environment parameters θ, we define the function
write(θ, c, x, y)
def≡

c [M := writegm(c.M, a, val2bytes(y))] : x = val(a,ptr(t))∧
y = val(b, t)
c′ : x = lref((v, o), i,ptr(t))∧
y = val(b, t)
write(θ, c,val(a,ptr(t)), y) : x = val(a,array(t, n))
write(θ, c, lref((v, o), i,ptr(t)), y) : x = lref((v, o), i,array(t, n))
undefined : otherwise
where c′.stack[i].ME = write lm(c.stack[i].ME , v, o, val2bytes(y)) and all other components of c′
are identical to c.
5.2.8 Type and Expression Evaluation
In order to define expression evaluation, we first introduce a few auxiliary functions computing
types of values, variables, functions, and expressions.
Definition 5.43 (Type of a Value). For a given C-IL value x ∈ val , the function τ(x) ∈ T extracts
the type of x as
τ(x)
def≡

t : x = val(y, t)
t : x = fun(y, t)
t : x = lref((v, o), i, t)
Definition 5.44 (Declared Variables). For a variable declaration list V ∈ (V × TQ)∗, we define
the function decl(V ) ∈ 2V that returns a set of declared variable names:
decl(V )
def≡
{
{v} ∪ decl(V ′) : V = V ′ ◦ (v, t)
∅ : V = ε
Definition 5.45 (Type of a Variable in a Declaration List). Given a variable v ∈ V, and a
declaration list T ∈ (V × TQ)∗, the function τV (v, T ) ∈ TQ returns the type of this variable, if it
is present in the list:
τV (v, T )
def≡

t : T = T ′ ◦ (v, t)
τV (v, T
′) : T = T ′ ◦ (v′, t) ∧ v′ 6= v
undefined : T = ε
117
5 Concurrent Mixed Machine Semantics for MIPS-86
Definition 5.46 (Type of a Field in a Declaration List). Analogously, for a given field f ∈ F,
and a declaration list T ∈ (F× TQ)∗, the function τF (f, T ) ∈ TQ is defined as
τF (f, T )
def≡

t : T = T ′ ◦ (f, t)
τF (f, T
′) : T = T ′ ◦ (f ′, t) ∧ f ′ 6= f
undefined : T = ε
Recall, that the declaration lists of global variables, fields for each struct type, and local vari-
ables for each function are given in the C-IL program. In the evaluation, we will often need to
refer to the declaration of the local variables used in the top-most stack frame. For this purpose
we use the already introduced shorthand Vtop(c, pi) for a configuration c ∈ CCIL and a program
pi ∈ ProgCIL.
Definition 5.47 (Type of a Function). In order to extract the type information for a function
with a name f ∈ Fname from a given function table F : Fname ⇀ FunT we define the function
τFfun(f) ∈ TQ such that
τFfun(f)
def≡
{
(∅, funptr (F(f).rettype, T )) : f ∈ dom (F)
undefined : otherwise
where the list T of length npar ≡ F(f).npar contains the parameter types Ti such that for all
i ∈ Nnpar the predicate ∃vi ∈ V. F(f).V [i] = (vi, Ti) holds.
Definition 5.48 (Qualified Type Evaluation). Now, for computing a qualified type of a given
expression in a body of a non-external C-IL function f ∈ Fname of a program pi ∈ ProgCIL wrt.
the environment parameters θ, we define the function
τpi,θf : E→ TQ
by a case distinction over a given expression:
• constant: x ∈ val
τpi,θf (x)
def≡ (∅, τ(x))
• variable names: v ∈ V
τpi,θf (v)
def≡

τV (v, pi.F(f).V ) : v ∈ decl (pi.F(f).V )
τV (v, pi.VG) : v /∈ decl (pi.F(f).V ) ∧ v ∈ decl(pi.VG)
(∅,void) : otherwise
• function names: fn ∈ Fname
τpi,θf (fn)
def≡ τFθpifun(fn)
• unary operator: e ∈ E, ∈ O1
τpi,θf (e)
def≡ τpi,θf (e)
• binary operator: e1, e2 ∈ E,⊗ ∈ O2
τpi,θf (e1 ⊗ e2)
def≡ τpi,θf (e1)
Note here that since the type is determined by the first operand, one has to exclude pointer
arithmetic with the first operand of a primitive type.
118
5.2 Sequential Intermediate C Semantics
• ternary operator: e, e1, e2 ∈ E
τpi,θf (e ? e1 : e2)
def≡ τpi,θf (e1)
• type cast: t ∈ TQ, e ∈ E
τpi,θf ((t)e)
def≡ t
• pointer dereferencing: e ∈ E
τpi,θf (∗(e))
def≡

t : τpi,θf (e) = (q,ptr(t))
t : τpi,θf (e) = (q,array(t, n))
(∅,void) : otherwise
• address of: e ∈ E
τpi,θf (&(e))
def≡

τpi,θf (e
′) : e = ∗(e′)
(∅,ptr(τpi,θf (v))) : e = v
(∅,ptr(q′ ∪ q′′, X)) : e = (e′).z ∧ τpi,θf (e′) = (q′, struct tC)∧
τF (z, pi.TF (tC)) = (q
′′, X)
(∅,void) : otherwise
• field access: e ∈ E, z ∈ F
τpi,θf ((e).z)
def≡ τpi,θf (∗(&((e).z)))
• size of a type or an expression: x ∈ TQ ∪ E
τpi,θf (sizeof(x))
def≡ (∅, θ.size t)
Note that this type evaluation was defined in [Sch13] for a given C-IL configuration instead of
a given function. The version of the evaluation τpi,θf (e) considered here in detail was introduced
in [Bau14b] without its definition. Obviously, one can easily define the type evaluation on a
given configuration c ∈ CCIL as
τpi,θc (e)
def≡ τpi,θftop(c)(e)
Definition 5.49 (Field Reference Computation). Given a C-IL program pi ∈ ProgCIL and the
environment parameters θ, we define the partial function σpiθ (x, f) ∈ val taking a pointer or a
local reference x ∈ val to a struct type and computing a pointer or a local reference to a given
field f ∈ F in the following way
σpiθ (x, f)
def≡

val(a+32 o
′
32,ptr(t
′′)) : x = val(a, t) ∧ t ∈ {ptr(t′),array(t′, n)}
lref((v, o+ o′), i,ptr(t′′)) : x = lref((v, o), i, t) ∧ t ∈ {ptr(t′),array(t′, n)}
undefined : otherwise
where the shorthands t′, t′′, and o′ denote
t′ ≡ struct tC t′′ ≡ qt2t(τF (f, pi.TF (tC))) o′ ≡ θ.offsetstruc(tC , f)
The definition of σpiθ (x, f) was introduced in the original work [Sch13]. However, the compu-
tation was defined only for values x of pointer types. Therefore, the C-IL semantics did not
allow to access a field in an array of structures. Since such arrays are pretty common in the
programming practice, the definition of σpiθ (x, f) was extended here.
119
5 Concurrent Mixed Machine Semantics for MIPS-86
Definition 5.50 (Expression Evaluation). Finally, the function [[·]]pi,θc : E⇀ val evaluates a given
expression of a program pi ∈ ProgCIL in a C-IL configuration c ∈ CCIL wrt. the environment
parameters θ in the following way:
• constant: x ∈ val
[[x]]pi,θc
def≡ x
• variable names: v ∈ V
[[v]]pi,θc
def≡ [[∗(&(v))]]pi,θc
• function names: fn ∈ Fname
[[fn]]pi,θc
def≡

val
(
θ.Fadr (fn), qt2t
(
τ
Fθpi
fun(fn)
))
: fn ∈ dom (Fθpi) ∧ fn ∈ dom (θ.Fadr )
fun
(
fn, qt2t
(
τ
Fθpi
fun(fn)
))
: fn ∈ dom (Fθpi) ∧ fn /∈ dom (θ.Fadr )
undefined : otherwise
• unary operator: e ∈ E, ∈ O1
[[e]]pi,θc
def≡ [[e]]pi,θc
• binary operator: e1, e2 ∈ E,⊗ ∈ O2
[[e1 ⊗ e2]]pi,θc
def≡ [[e1]]pi,θc ⊗ [[e2]]pi,θc
• ternary operator: e, e1, e2 ∈ E
[[e ? e1 : e2]]
pi,θ
c
def≡

[[e1]]
pi,θ
c : ¬zeroθ
(
[[e]]pi,θc
)
[[e2]]
pi,θ
c : zeroθ
(
[[e]]pi,θc
)
undefined : otherwise
• type cast: t ∈ TQ, e ∈ E
[[(t)e]]pi,θc
def≡ θ.cast ([[e]]pi,θc , qt2t(t))
• pointer dereferencing: e ∈ E
[[∗(e)]]pi,θc
def≡

read(θ, c, [[e]]pi,θc ) :
(
τ
(
[[e]]pi,θc
)
= ptr(t) ∧ ¬isarray(t))∨
τ
(
[[e]]pi,θc
)
= array(t, n)
val(a,array(t, n)) : [[e]]pi,θc = val(a,ptr(array(t, n)))
lref((v, o), i,array(t, n)) : [[e]]pi,θc = lref((v, o), i,ptr(array(t, n)))
undefined : otherwise
• address of: e ∈ E
[[&(e)]]pi,θc
def≡

[[e′]]pi,θc : e = ∗(e′)
lref((v, 0), |c.stack|,ptr(t′)) : e = v ∧ v ∈ decl(Vtop(c, pi))
val(θ.allocgvar (v),ptr(t
′′)) : e = v ∧ v /∈ decl(Vtop(c, pi))∧
v ∈ decl(pi.VG)
σpiθ
(
[[&(e′)]]pi,θc , f
)
: e = (e′).f
undefined : otherwise
120
5.2 Sequential Intermediate C Semantics
where
t′ ≡ qt2t(τV (v, Vtop(c, pi))) t′′ ≡ qt2t(τV (v, pi.VG))
• field access: e ∈ E, f ∈ F
[[(e).f ]]pi,θc
def≡ [[∗(&((e).f))]]pi,θc
• size of a type : t ∈ TQ
[[sizeof(t)]]pi,θc
def≡ val(sizeθ(qt2t(t))32, θ.size t)
• size of an expression: e ∈ E
[[sizeof(e)]]pi,θc
def≡ [[sizeof(τpi,θc (e))]]pi,θc
5.2.9 Transition Function
First, we introduce functions performing specific updates on a C-IL configuration.
Definition 5.51 (C-IL Configuration Update Functions). For a given C-IL configuration c ∈
CCIL with a non-empty stack c.stack 6= ε, we define the following operations where t ≡ top(c):
• incrementing the location counter:
incloc(c)
def≡ c [stack := c.stack[t 7→ frame ′]]
with frame ′ ≡ c.stack[t] [loc := loctop(c) + 1]
• setting the location counter to l ∈ N:
set loc(c, l)
def≡ c [stack := c.stack[t 7→ frame ′]]
with frame ′ ≡ c.stack[t] [loc := l]
• removing the top-most frame:
dropframe(c)
def≡ c [stack := c.stack[1 : t− 1]]
• setting the return destination to x ∈ valptr ∪ val lref ∪ {⊥}:
setrds(c, x)
def≡ c [stack := c.stack[t 7→ frame ′]]
with frame ′ ≡ c.stack[t] [rds := x]
Obviously, we are interested in the steps of the C-IL machine performed on well-formed (or
valid) configurations.
121
5 Concurrent Mixed Machine Semantics for MIPS-86
Definition 5.52 (Well-Formed C-IL Configuration). A sequential C-IL configuration c ∈ CCIL
is well-formed wrt. a C-IL program pi ∈ ProgCIL and the environment parameters θ iff its stack
st ≡ c.stack is well-formed. This means that (i) the stack st is not empty and (ii) every stack frame
(a) corresponds to a function implemented in the given C-IL program, and (b) has the location
inside the function. Moreover, (c) the local memory of each frame contains all variables and
parameters declared in the function and (d) has the proper size. Additionally, (e) if the function
is called with a return value, its destination corresponds to the type of the return value in the
function declaration.
wfstackpi,θCIL(st)
def≡ (i) st 6= ε
(ii) ∀ i ∈ Ntop(st).
(a) fi(st) ∈ dom
(Fθpi) ∧ ¬ext(fi(st), pi, θ)
(b) loci(st) ∈ [1 : |Pi(st, pi)|]
(c) dom (ME i(st)) = decl(Vi(st, pi))
(d) ∀ (v, t) ∈ Vi(st, pi). |ME i(st)(v)| = sizeθ(qt2t(t))
(e) i < top(st) ∧ τ(rdsi(st)) ∈ {ptr(t),array(t, n)} =⇒
t = qt2t(rettypei+1(st, pi))
Finally, we can formalize the operational semantics of the sequential C-IL.
Definition 5.53 (Sequential C-IL Transition Function). For a given C-IL program pi ∈ ProgCIL
and the environment parameters θ ∈ ParamsCIL, the transitions of the sequential C-IL machine
are defined by the function
δpi,θCIL : CCIL → CCIL⊥
with CCIL⊥
def≡ CCIL ∪ {⊥} containing the error state ⊥.
Note that the C-IL transition function in [Sch13] has an additional input resolving the non-
deterministic choice of initial values for local variables in the newly created stack frame. In the
scope a this work, however, analogously to the C0 semantics [PBLS15] we initialize the local
variables with zeros. Moreover, we also use the error state to signal any run-time errors.
So, for a C-IL configuration c ∈ CCIL we define δpi,θCIL(c) by a case split on the next C-IL state-
ment to be executed and a kind of a function in case of a function call:
• Assignment: stmtnext(c, pi) = (e0 = e1)
δpi,θCIL(c)
def≡ incloc
(
write(θ, c, [[&(e0)]]
pi,θ
c , [[e1]]
pi,θ
c )
)
• Goto: stmtnext(c, pi) = goto l
δpi,θCIL(c)
def≡
{
set loc (c, l) : l ≤ |Ptop(c, pi)|
⊥ : otherwise
• If-Not-Goto: stmtnext(c, pi) = ifnot e goto l
δpi,θCIL(c)
def≡

set loc (c, l) : zeroθ([[e]]
pi,θ
c ) ∧ l ≤ |Ptop(c, pi)|
incloc(c) : ¬zeroθ([[e]]pi,θc )
⊥ : otherwise
122
5.2 Sequential Intermediate C Semantics
• Function return without a return value: stmtnext(c, pi) = return
δpi,θCIL(c)
def≡
{
c′ : rdstop(c′) = ⊥ ∧ top(c) > 1
⊥ : otherwise
where c′ ≡ dropframe(c)
• Function return with a return value: stmtnext(c, pi) = return e
δpi,θCIL(c)
def≡

write
(
θ, setrds(c
′,⊥), rdstop(c′), [[e]]pi,θc
)
: rdstop(c
′) 6= ⊥ ∧ top(c) > 1
c′ : rdstop(c′) = ⊥ ∧ top(c) > 1
⊥ : otherwise
where c′ ≡ dropframe(c)
• Non-external function call:
– the next statement is a function call with or without a return value
stmtnext(c, pi) ∈ {e0 = call e(E), call e(E)}
– the expression e evaluates to some non-external function f
isfunc([[e]]pi,θc , f, θ) ∧ f ∈ dom
(Fθpi) ∧ ¬ext(f, pi, θ)
– the types of all parameters match the function declaration
|E| = Fθpi(f).npar ∧ ∀i ∈ [1 : |E|]. Fθpi(f).V [i] = (v, t) =⇒ τpi,θc (E[i]) = t
– for a call with a return value the function must return a result of a non-void type
stmtnext(c, pi) = (e0 = call e(E)) =⇒ qt2t
(Fθpi(f).rettype) 6= void
If these conditions hold, the non-error result of the transition is defined as
δpi,θCIL(c)
def≡ c′
such that
c′.stack = incloc(setrds(c, rds′)).stack ◦ (f,⊥, 1,M′E)
c′.M = c.M
where
rds′ ≡
{
[[&(e0)]]
pi,θ
c : stmtnext(c, pi) = (e0 = call e(E))
⊥ : stmtnext(c, pi) = call e(E)
and
M′E(v) =

val2bytes
(
[[E[i]]]pi,θc
)
: Fθpi(f).V [i] = (v, t) ∧ i ≤ Fθpi(f).npar
(08)sizeθ(qt2t(t)) : Fθpi(f).V [i] = (v, t) ∧ i > Fθpi(f).npar
undefined : otherwise
As follows from the computation of M′E , in C-IL semantics any function cannot have
parameters not reperesentable as byte-strings, e.i., local references and symbolic function
values.
123
5 Concurrent Mixed Machine Semantics for MIPS-86
• Compare-and-swap call (external intrinsic function):
– the next statement is a function call without a return value
stmtnext(c, pi) = call e(E)
– the expression e evaluates to the compare-and-swap intrinsic function cas ∈ Fname
isfunc([[e]]pi,θc , cas, θ)
– the types of all parameters match the function declaration
|E| = Fθpi(f).npar ∧ ∀i ∈ N|E|. Fθpi(f).V [i] = (v, t) =⇒ qt2t
(
τpi,θc (E[i])
)
= qt2t (t)
Let the input parameters be E = edest ◦ ecmp ◦ eexch ◦ eret . Then the transition result is
defined as
δpi,θCIL(c)
def≡
{
incloc
(
write
(
θ, c′, [[edest ]]pi,θc , [[eexch ]]
pi,θ
c
))
: valread = [[ecmp ]]
pi,θ
c
incloc(c
′) : valread 6= [[ecmp ]]pi,θc
where valread ≡ read
(
θ, c, [[edest ]]
pi,θ
c
)
and c′ ≡ write (θ, c, [[eret ]]pi,θc , valread).
In all other cases, or if the evaluation of any expression present in the statement fails, or the
application of any other functions involved into the transition is undefined, the computation
also leads to the error state δpi,θCIL(c)
def≡ ⊥.
5.3 Sequential Mixed Machine Semantics
5.3.1 MX Programs
Definition 5.54 (MX Program). A mixed machine program is simply represented by a couple
of the C-IL and MASM programs:
ProgMX
def≡ ProgMASM × ProgCIL
In this chapter we will mostly consider an MX program pi ∈ ProgMX composed as pi =
(piµ, picil).
5.3.2 Environment Parameters
Since the environment parameters are only required for the C-IL semantics, we directly use the
same θ ∈ ParamsCIL in this section.
5.3.3 Machine Configuration
In order to obtain an integrated model of C-IL and MASM, one has to consider a list of stacks of
both machines with a common byte-addressable memory and some additional individual com-
ponents relevant for the machine execution and switching between C-IL and MASM. According
to [Sha12] such stacks with additional information are called execution contexts. A context cor-
responding to a machine currently performing a step is an active one whereas all others are
considered to be inactive. In fact, the active execution context is a configuration of the corre-
sponding machine without the memory.
Before defining states of the MX machine, we first consider such contexts in detail.
124
5.3 Sequential Mixed Machine Semantics
Definition 5.55 (Active MASM Context). An active MASM execution context consists of the
MASM stack and general purpose registers.
contextactiveMASM
def≡ (stack ∈ frame∗MASM, gpr : B5 ⇀ B32)
Definition 5.56 (Inactive MASM Context). Instead of all GPRs used in the MASM configura-
tion and the MASM active context, an inactive MASM context keeps additionally to the stack
only the callee-save registers required for restoring the GPRs content after returning from the
C-IL machine.
context inactiveMASM
def≡ (stack ∈ frame∗MASM, gprcallee : Regcallee → B32)
Note that gprcallee contains the state of registers before a C-IL function call from a MASM pro-
cedure in the topmost frame.
In contrast to the MASM execution contexts, active and inactive C-IL contexts are represented
by the C-IL stack.
Definition 5.57 (C-IL Context). Therefore, in both cases we operate with the C-IL context con-
taining a list of C-IL frames.
contextCIL
def≡ frame∗CIL
Note that in the original version of the MX model in [SS12, Sha12] the inactive C-IL context
contains the callee-save registers too. However, during the work on this thesis together with the
authors of [SS12, Sha12] it was discovered that this component is not needed for the operational
semantics of the MX machine. Therefore, we do not include it into MX machine configurations
here.
Definition 5.58 (Sequential MX Configuration). Then mixed machine configurations are rep-
resented by a set of tuples
CMX
def≡
(
ac ∈ contextCIL ∪ contextactiveMASM,
ic ∈ (contextCIL ∪ context inactiveMASM )∗ ,
spr : B5 → B32, M : B32 → B8
)
containing the active context ac, the list ic of inactive execution contexts, the byte addressable
memoryM, and the special purpose registers spr.
We model SPRs as a component visible even during the C-IL execution. One of the reasons
stated in [SS12] is future work modeling interrupts in the C-IL semantics. Moreover, according
to the compiler calling convention, the compilers are not supposed to save and restore the spe-
cial purpose registers during function/procedure calls and returns. Hence, it is reasonable to
keep the values of the special purpose registers during C-IL steps of the mixed machine.
5.3.4 Transition Function
Before we consider the semantics of the mixed machine, we introduce a few auxiliary definitions
and shorthands.
125
5 Concurrent Mixed Machine Semantics for MIPS-86
Definition 5.59 (Type of Context). We define predicates checking whether an execution context
k ∈ contextCIL ∪ contextactiveMASM ∪ context inactiveMASM corresponds to C-IL or MASM respectively.
cil(k)
def≡ (k ∈ contextCIL)
masm(k)
def≡ (k ∈ contextactiveMASM ∪ context inactiveMASM )
Definition 5.60 (Number of Frames). We compute the number of frames in a single C-IL or MASM
context k as
nfcntx (k)
def≡
{
|k.stack| : masm(k)
|k| : otherwise
Then for a given MX configuration c ∈ CMX the number of frames in its list of inactive contexts is
denoted by
nfic(c)
def≡
{∑|c.ic|
j=1 nfcntx (c.ic[j]) : c.ic 6= ε
0 : otherwise
Similarly, the number of frames in the active context of the configuration c is
nfac(c)
def≡ nfcntx (c.ac)
The semantics of the sequential MX machine is obviously based on the earlier defined C-IL
and MASM transition functions. Hence, in order to perform the corresponding steps of C-IL
and MASM machines, we first compute their configurations.
Definition 5.61 (Construction of C-IL Configuration from MX Machine). Given an MX config-
uration c ∈ CMX with the active C-IL context, i.e., cil(c.ac) holds, we construct a corresponding
C-IL configuration confCIL(c) ∈ CCIL from the MX configuration as follows:
confCIL(c)
def≡ ccil
such that the memory configuration is copied and the stack is formed from the frames of the
active context and dummy frames
ccil.M = c.M ccil.stack = stackany ◦ c.ac
where |stackany | = nfic(c) and each dummy frame with index i ∈ Nnfic(c) has arbitrary value
stackany [i] =  frameCIL.
Note that the content of dummy frames does play any role in the definition of the MX se-
mantics because the machine never relies on it. The dummy frames are introduced only for the
proper numbering the C-IL frames in the overall stack of the mixed machine because the frame
index is included into the local reference value. This fact was not taken into account in [Sha12].
In contrast to the C-IL configuration, the state of the MASM machine does not store indices
of stack frames. Therefore, there is no need to add dummy frames in the construction of the
MASM stack from the active and inactive contexts of the mixed machine.
Definition 5.62 (Construction of MASM Configuration from MX Machine). Given an MX
configuration c ∈ CMX with the active MASM context, i.e., masm(c.ac) holds, we construct a
corresponding MASM configuration confMASM(c) ∈ CMASM from the MX configuration as
confMASM(c)
def≡ cµ
126
5.3 Sequential Mixed Machine Semantics
where the components of the MASM configuration cµ are
cµ.M = c.M cµ.stack = c.ac.stack
cµ.gpr = c.ac.gpr cµ.spr = c.spr
Definition 5.63 (Well-Formed MX Configuration). An active context ac and a list of inactive
contexts ic are well-formed wrt. an MX program pi = (piµ, picil) ∈ ProgMX and the environment
parameters θ iff (i) the stack of every context is well-formed, (ii) the GPRs of the active MASM
context are well-formed, and (iii) – (v) the adjacent contexts are of different types. Let ni ≡ |ic|
be the number of inactive contexts. Then, the well-formedness predicate is defined as
wfcntxpi,θMX(ac, ic)
def≡ (i) ∀ k ∈ (ic ◦ ac).(
masm(k) =⇒ wfstackpiµMASM(k.stack)
) ∧ (cil(k) =⇒ wfstackpicil,θCIL (k))
(ii) masm(ac) =⇒ wfregMASM(ac.gpr)
(iii) ∀ j ∈ [1 : ni− 1].
(masm(icj)⇐⇒ cil(icj+1)) ∧ (cil(icj)⇐⇒ masm(icj+1))
(iv) masm(ac) =⇒ ic = ε ∨ cil(icni)
(v) cil(ac) =⇒ ic = ε ∨masm(icni)
A sequential MX configuration c ∈ CMX is well-formed iff its active and inactive contexts are
well formed:
wfconf pi,θMX(c)
def≡ wfcntxpi,θMX(c.ac, c.ic)
Definition 5.64 (Sequential MX Transition Function). For a given MX program pi ∈ ProgMX
such that pi = (piµ, picil) and the environment parameters θ ∈ ParamsCIL, the transitions of the
sequential mixed semantics machine are defined by the function
δpi,θMX : CMX × ΣMX → CMX⊥
where CMX⊥
def≡ CMX ∪ {⊥} contains the error state ⊥ and ΣMX def≡ B5 → B32 is an input al-
phabet used for non-deterministic choice of GPRs content during the call of a MASM procedure
from C-IL context and the return to a MASM procedure from C-IL. 8
In order to define the MX transition function on an MX configuration c ∈ CMX, we introduce
the following shorthands:
• if the active context is MASM, i.e., masm(c.ac):
cµ ≡ confMASM(c)
c′µ ≡ δpiµMASM(cµ)
instrµ ≡ instrnext(cµ, piµ)
• if the active context is C-IL, i.e., cil(c.ac):
ccil ≡ confCIL(c)
c′cil ≡ δpicil,θCIL (ccil)
stmtcil ≡ stmtnext(ccil, picil)
8Note that though this non-deterministic initialization was introduced in [Sha12], the original version of the transition
function in [Sha12] has no additional input, and had to be formalized as a transition relation.
127
5 Concurrent Mixed Machine Semantics for MIPS-86
Then, the result of the computation δpi,θMX(c, in) with in ∈ ΣMX is considered for the cases given
below and depending on the active context and the instruction/statement to be executed. In all
other situations, additionally to the error-state generated by C-IL and MASM semantics, we set
δpi,θMX(c, in)
def≡ ⊥. Note that the input in is ignored in all cases except switching from C-IL to
MASM.
• Pure MASM step:
– the active context is of MASM type, i.e., masm(c.ac) holds
– in case of a procedure call the procedure is defined in the MASM program
(instrµ = call pn) =⇒ pn ∈ dom (piµ) ∧ ¬ext(pn, piµ)
– in case of a procedure return the active context contains more than one stack frame
(instrµ = ret) =⇒ nfac(c) > 1
If the conditions hold, the result of the transition is computed as
δpi,θMX(c, in)
def≡
{
c′ : c′µ 6= ⊥
⊥ : otherwise
where the non-error state c′ has the components
c′.ac =
(
c′µ.stack, c
′
µ.gpr
)
c′.spr = c′µ.spr
c′.ic = c.ic c′.M = c′µ.M
• Pure C-IL step:
– the active context is of C-IL type, i.e., cil(c.ac) holds
– in case of a function call the function is either defined in the C-IL program or the
external intrinsic function cas
stmtcil ∈ {e0 = call e(E), call e(E)} =⇒
isfunc
(
[[e]]picil,θccil , f, θ
) ∧ f ∈ dom (Fθpicil) ∧ (¬ext(f, picil, θ) ∨ f = cas)
– in case of a function return the active context contains more than one stack frame
stmtcil ∈ {e0 = return e, return} =⇒ nfac(c) > 1
Again, if the conditions hold, the result of the transition is computed as
δpi,θMX(c, in)
def≡
{
c′ : c′cil 6= ⊥
⊥ : otherwise
where the non-error state c′ has the components
c′.ac = c′cil.stack[nfic(c) + 1 : top(c
′
cil)] c
′.spr = c.spr
c′.ic = c.ic c′.M = c′cil.M
128
5.3 Sequential Mixed Machine Semantics
• Call from C-IL to MASM:
– the active context is of C-IL type, i.e., cil(c.ac) holds
– the statement to be executed is the function/procedure call
stmtcil ∈ {e0 = call e(E), call e(E)}
– the function/procedure is declared in the C-IL program as an external one
isfunc
(
[[e]]picil,θccil , f, θ
) ∧ f ∈ dom (Fθpicil) ∧ ext(f, picil, θ) ∧ f 6= cas
– and implemented in the MASM program
f ∈ dom (piµ) ∧ ¬ext(f, piµ)
– the number of parameters matches the procedure declaration
|E| = Fθpicil(f).npar = piµ(f).npar
– the value of each parameter can be represented as a bit-string (what excludes local
references and symbolic function values)
∀j ∈ [1 : |E|]. [[E[j]]]picil,θccil = val(bj , tj)
Under these conditions the result of the transition is defined as
δpi,θMX(c, in)
def≡ c′
where the new configuration c′ ∈ CMX is formed as
c′.ac =
(
frame ′µ, gpr
′) c′.spr = c.spr
c′.ic = c.ic ◦ (stack′cil[nfic(c) + 1 : top(ccil)]) c′.M = c.M
such that the GPRs of the new active MASM context are initialized with values of the
parameters according to the calling convention or the arbitrarily chosen values from the
input in (meaning that a program written in MX must guarantee the expected result for
any such input)
gpr′(r) =

bj : r = ij ∧ j ∈
[
1 : npar
piµ
regs(f)
]
032 : r = zero
in(r) : otherwise
and the components of the newly created stack frame are initialized as follows:
frame ′µ.p = f frame
′
µ.loc = 1 frame
′
µ.lifo = ε
frame ′µ.saved(r) =
{
gpr′(r) : 〈r〉 ∈ piµ(f).uses
undefined : otherwise
|frame ′µ.pars| = |E|
129
5 Concurrent Mixed Machine Semantics for MIPS-86
frame ′µ.pars[j] =
{
bj : j > npar
piµ
regs(f)
031 : otherwise
Moreover, the C-IL context becomes inactive after storing the return value destination and
increasing the location counter
rds′ ≡
{
[[&(e0)]]
picil,θ
ccil
: stmtcil = (e0 = call e(E))
⊥ : stmtcil = call e(E)
stack′cil ≡ incloc(setrds(ccil, rds′)).stack
• Return from MASM to C-IL:
– the active context is of MASM type, i.e., masm(c.ac) holds
– the instruction to be executed is the procedure return instrµ = ret
– there is only one stack frame in the active MASM context and the list of inactive
contexts is not empty
nfac(c) = 1 ∧ c.ic 6= ε
In order to define the result of the computation δpi,θMX(c, in), we first form an intermediate
MX configuration cˆ ∈ CMX with components
cˆ.ac = c.ic[ni] cˆ.spr = c.spr
cˆ.ic = c.ic[1 : ni− 1] cˆ.M = c.M
where ni ≡ |c.ic| denotes the number of inactive contexts in the configuration. Hence, the
C-IL configuration cˆcil ∈ CCIL constructed from cˆ is
cˆcil ≡ confCIL(cˆ)
Now, we can compute the transition result on the C-IL configuration cˆ similarly to the
function return in the C-IL semantics. Let the return destination a ∈ val ∪ {⊥} and the
value v ∈ val returned for a 6= ⊥ be
a ≡ rdstop(cˆcil)
v ≡ val (c.ac.gpr(rv), t) with τ(a) ∈ {ptr(t),array(t, n)}
Then the resulting C-IL configuration cˆ′cil ∈ CCIL is
cˆ′cil ≡
{
write (θ, setrds(cˆcil,⊥), a, v) : a 6= ⊥
cˆcil : otherwise
and we easily define the effect of the return from MASM to C-IL as
δpi,θMX(c, in)
def≡ c′
c′.ac = cˆ′cil.stack[nfic(cˆ) + 1 : top(cˆ
′
cil)] c
′.spr = cˆ.spr
c′.ic = cˆ.ic c′.M = cˆ′cil.M
• Call from MASM to C-IL:
130
5.3 Sequential Mixed Machine Semantics
– the active context is of MASM type, i.e., masm(c.ac) holds
– the instruction to be executed is the procedure call
instrµ = call f
– the function/procedure is declared as an external one in the MASM program
f ∈ dom (piµ) ∧ ext(f, piµ)
– and implemented in the C-IL program
f ∈ dom (Fθpicil) ∧ ¬ext(f, picil, θ)
– the numbers of parameters in the MASM and C-IL declarations are equal
picil.F(f).npar = piµ(f).npar
– the lifo of the top-most frame of the active context contains at least as many elements
as the number of function parameters to be passed on the stack:
|lifotop(cµ)| ≥ nparpiµstack (f)
If these conditions hold, the result of the transition is defined as
δpi,θMX(c, in)
def≡ c′
c′.ac = frame ′cil c
′.spr = c.spr
c′.ic = c.ic ◦ (stack′µ, c.ac.gpr|Regcallee) c′.M = c.M
where the components of the newly created stack frame are initialized as follows:
frame ′cil.f = f frame
′
cil.loc = 1 frame
′
cil.rds = ⊥
frame ′cil.ME(v) =

bits2bytes (c.ac.gpr(r)) : Vf [j] = (v, t) ∧ r = ij ∧ j ∈
[
1 : npar
piµ
regs(f)
]
bits2bytes (pars ′[j − 4]) : Vf [j] = (v, t) ∧ j ∈ [5 : piµ(f).npar ]
(08)sizeθ(qt2t(t)) : Vf [j] = (v, t) ∧ j > piµ(f).npar
undefined : otherwise
with the shorthands Vf ≡ Fθpicil(f).V and pars ′ ≡ rev
(
lifotop(cµ) [m : n]
)
for the indices
n ≡ |lifotop(cµ)|, m ≡ n+ 1− nparpiµstack (f).
Moreover, the active MASM context becomes inactive with the snapshot of the callee-save
registers c.ac.gpr|Regcallee and the updated stack
stack′µ ≡ incloc
(
poplifo
(
cµ,npar
piµ
stack (f)
))
.stack
• Return from C-IL to MASM:
– the active context is of C-IL type, i.e., cil(c.ac) holds
– the statement to be executed is function return
stmtcil ∈ {return e, return}
131
5 Concurrent Mixed Machine Semantics for MIPS-86
– there is only one stack frame in the active C-IL context and the list of inactive context
is not empty
nfac(c) = 1 ∧ c.ic 6= ε
– the return value can be represented as a bit-string
(stmtcil = return e) =⇒ [[e]]picil,θccil = val(b, t)
Under these conditions the result of the step is defined as
δpi,θMX(c, in)
def≡ c′
c′.ac = (c.ic[ni].stack, gpr′) c′.spr = c.spr
c′.ic = c.ic[1 : ni− 1] c′.M = c.M
where ni ≡ |c.ic| is the number of inactive contexts in the configuration c and the con-
tent gpr′ of the general-purpose registers in the active MASM context is initialized in the
following way:
gpr′(r) =

c.ic[ni].gprcallee(r) : r ∈ Regcallee
032 : r = zero
b : r = rv ∧ (stmtcil = return e)
in(r) : otherwise
5.4 Concurrent Mixed Machine Semantics
Finally, we can define the concurrent mixed machine semantics in which the steps of nt ∈
N sequential mixed machines (or MX threads) with the shared memory are interleaved non-
deterministically.
First, we collect the MX machine components characterizing individual MX threads.
Definition 5.65 (MX Thread Configuration). A configuration of an MX thread consists of the
active context, a list of inactive contexts, and the special purpose registers
KMX
def≡
(
ac ∈ contextCIL ∪ contextactiveMASM,
ic ∈ (contextCIL ∪ context inactiveMASM )∗ ,
spr : B5 → B32
)
Definition 5.66 (Configuration of Concurrent MX Machine). Then configurations of the con-
current mixed machine containing nt ∈ N MX threads are defined by the set
CcMX
def≡ (k : Nnt → KMX, M : B32 → B8)
where k is a mapping from the index to an MX thread configuration, and M is the shared
memory.
132
5.4 Concurrent Mixed Machine Semantics
Definition 5.67 (Sequential MX Machine Configuration from Concurrent MX). For a given
MX thread configuration k ∈ KMX and a memory configuration M we introduce the auxil-
iary function confMX(k,M) ∈ CMX combining both arguments into a sequential MX machine
configuration
confMX(k,M) def≡ (k.ac, k.ic, k.spr, M)
Then we overload the same function for a given concurrent MX configuration c ∈ CcMX and
t ∈ Nnt as
confMX(c, t)
def≡ confMX(c.k(t), c.M)
Obviously, we call the configuration of the MX thread t well-formed if and only if its cor-
responding sequential MX machine configuration is well-formed, i.e., wfconf pi,θMX (confMX(c, t))
holds.
Definition 5.68 (Concurrent MX Transition Function). Now, for a given MX program pi ∈
ProgMX and the environment parameters θ ∈ ParamsCIL, the transitions of the concurrent
mixed machine are defined by the function
δpi,θcMX : CcMX × Nnt × ΣMX → CcMX⊥
such that for a configuration c ∈ CcMX, an index t ∈ Nnt of an MX thread performing a step,
and an input in ∈ ΣMX, the result of the transition (that can also lead to the run-time error ⊥) is
defined as
δpi,θcMX(c, t, in)
def≡
{
(c.k[t 7→ k′t], c′t.M) : c′t 6= ⊥
⊥ : otherwise
where the next sequential MX configuration c′t is computed as c′t ≡ δpi,θMX(confMX(c, t), in) and
the next MX thread configuration is k′t ≡ (c′t.ac, c′t.ic, c′t.spr).
133

6
Compiler Correctness and
Justification of Concurrent
Mixed Model
This chapter is devoted to the detailed definition of compiler correctness for our sequential
mixed machine with all necessary requirements in the concurrent context. Using this cor-
rectness statement we will consider the justification of the concurrent MX model with np MX
threads whose compiled code runs on the multi-core MIPS-86 ISA model with np processors.
Originally, MX compiler correctness for a single core VAMP processor was described by
Shadrin in his doctoral thesis [Sha12] based on the previous work [Lei08, LPP05] from the
Verisoft project [Ver10]. Having developed the model for concurrent systems and simulation,
Baumann [Bau14b] restated these correctness criteria separately for the C-IL and MASM im-
plemented on MIPS-86 and showed how one can establish the concurrent simulation in both
cases. However, the work was done under assumption that store buffer reduction similar to
the one considered in [Kov13] was applied to the reference MIPS-86 machine, and no software
conditions and policy needed for the reduction to go through (see Chapter 4) were given.
Using the results from [Sha12, Kov13, Bau14b], we continue the research on this topic for the
overall MX semantics. Particularly, since we will use the full mathematical statement of the MX
compiler correctness in order to introduce more powerful semantics for system programming
in the scope of this thesis, we adapt the results from the previous work and resolve issues
not matching the argumentation about the higher levels in our model stack. Moreover, we
especially concentrate on software conditions and safety policy enabling both sequential and
concurrent simulation between the abstract MX semantics and its implementation.
6.1 Compiler Correctness for Sequential Mixed Machine
6.1.1 Mixed Program Compilation and Stack Layout
The compilation of any mixed machine program pi = (piµ, picil) ∈ ProgMX is done separately by
C-IL and MASM compilers. The compiled code is placed into the memory by a linker at cor-
responding code base addresses cbaµ, cbacil ∈ B32 which are considered in this work as system
parameters or determined by the linker. Later we will introduce conditions on the placement of
the code.
The bird eye’s view on the stack implementation of the mixed machine in given in Figure 6.1.
The allocated stack consists of the space occupied by alternating portions of C-IL and MASP
stack regions and the free space to be used if the stack is not full. Let mss ∈ N denote the
maximal stack size in bytes. Then, the stack grows down starting from the mss-th byte of the
stack (left corner) pointed by the stack base address and occupies the memory till the topmost
frame. The first byte (right corner) of this frame is addressed by the stack pointer residing in the
135
6 Compiler Correctness and Justification of Concurrent Mixed Model
C-IL stack
gpr ( sp)
...
1st frame
(i+1)-th frame
i-th frame
... MASM stack
...
stack base
address (sba)
C-IL stack...
top-most frame     frame base pointer
stack pointer
gpr (bp)
oc
cu
pi
ed
 st
ac
k 
sp
ac
e
fr
ee
 st
ac
k 
sp
ac
e
Low memory
High memory
Figure 6.1: Overview of MX stack layout in memory.
GPRs. The frame base pointer in the corresponding register points to a special position inside
the frame.
First, we consider the compilation of the C-IL and MASM programs, and the physical layout
of stack frames for each case in detail.
6.1.1.1 Macro Assembly Compiler Information and Stack Frames
During the program compilation (or code generation) the MASM compiler produces static in-
formation.
Definition 6.1 (Macro Assembly Compiler Information). We define the set of tuples represent-
ing the MASM compiler information as
infoT MASM
def≡ (code ∈ I∗ASM, off : Pname × N⇀ N0)
where the components have the following meaning:
• code – a list of MIPS-86 assembly instructions representing a given compiled MASM pro-
gram,
• off – a function calculating the offset in the compiled code for the first assembly instruc-
tion implementing a MASM instruction at a given location in a given procedure.
136
6.1 Compiler Correctness for Sequential Mixed Machine
gpr (bp)
High memory
Low memory
saved
register area
...
pbp
parameters
lifo
return address
...
              home
                homepars1
...
return address
pbp
frame j+1
 procedure p
saved register 1 
parsnreg
parsnpar
saved register n 
...
lifo
frame j
 procedure p'
gpr (sp)
Figure 6.2: Layout of MASM stack frames. The shadowed parts represent in detail the space
where the parameters and callee-saved registers are stored. The i + 1-th frame is the topmost
one here.
Definition 6.2 (MASM Compilation Function). Through out the thesis we will use the unin-
terpreted function
cmplMASM : ProgMASM ⇀ infoT MASM
generating the static compiler information for any given MASM program. The function is sim-
ilar to the one considered in detail in [Sha12] and we do not provide its definition here.
Definition 6.3 (Code Address of MASM Instruction/Macro). Given the compiler information
infoµ ∈ infoT MASM and the code base address cbaµ, one can easily compute the starting code
address of a MASM instruction at the location loc ∈ N in a procedure p ∈ Pname as
caMASM(p, loc, infoµ, cbaµ)
def≡ cbaµ +32 (4 · infoµ.off (p, loc))32
The layout of the MASM stack frames in the memory is depicted in Figure 6.2. Each frame
contains the space for parameters, return code address from the procedure, callee-save registers,
and the lifo. Any frame can be identified by its base address pointing to a word on the stack
keeping the base address of the previous frame, or previous base pointer pbp. The base address
of the topmost frame is kept in the GPR. The parameters are saved in the order defined by
the compiler calling convention such that the home addresses for the first nreg = nparpiµregs(p)
parameters are reserved on the stack.
137
6 Compiler Correctness and Justification of Concurrent Mixed Model
In the thesis we will particularly rely on the implementation of MASM instruction ret ∈
SMASM given in [Sha12] and corresponding to the compiler calling convention for MIPS-86.
The compiled code of this instruction called on the return from any procedure p performs the
following steps:
1. restore callee-save registers present in piµ(p).uses ,
2. adjust the stack pointer so that the stack is cleaned up from the parameters,
3. restore the base pointer from pbp saved in the frame,
4. return to the caller by jumping to the code at the return address stored in the frame.
Here, we do not present the formal definition of ret compilation and refer the interested reader
to Section 3.2 of [Sha12] devoted to the topic of assembling MASM programs.
A memory word residing in the memory occupied by the stack is called a stack item. Obvi-
ously, addressing stack items is word-aligned.
Definition 6.4 (Stack Items in Memory). Given an ISA memory m : B32 → B8, an address
ad ∈ B32 pointing to a stack item, and an index i ∈ Z of an another stack item above/below the
given one (relative offset s.t. i = 0 for the item pointed by ad), we compute the value of the i-th
item as
item(m, ad, i)
def≡ m4 (bin32 (〈ad〉+ 4 · i))
Moreover, we can read a list of n ∈ N items starting at the address ad:
items↑(m, ad, n)
def≡
{
ε : n = 0
items↑(m, ad, n− 1) ◦ item(m, ad, n− 1) : otherwise
items↓(m, ad, n)
def≡
{
ε : n = 0
items↓(m, ad, n− 1) ◦ item(m, ad,−(n− 1)) : otherwise
Definition 6.5 (Computation of Common Stack Frame Components from Memory). Given
an ISA memory m : B32 → B8 and a base address ba ∈ B32 of a corresponding frame in the
memory, we retrieve the previous base pointer and the return address saved in the frame as
pbp(m, ba)
def≡ item(m, ba, 0)
ra(m, ba)
def≡ item(m, ba, 1)
Note that these items are common for MASM and C-IL frames (considered later).
Definition 6.6 (Computation of MASM Stack Frame Components from Memory). Analo-
gously, other components of a MASM frame can be computed as follows:
• a list of n ∈ N0 input parameters
pars
piµ
MASM(m, ba, p)
def≡ items↑ (m, ba+32 832, piµ(p).npar)
• saved registers (in order corresponding to the list)
Using ns ≡ |piµ(p).uses|we compute
saved
piµ
MASM(m, ba, p) : B
5 ⇀ B32
saved
piµ
MASM(m, ba, p)(r)
def≡
{
items↓ (m, ba−32 432, ns) [i] : 〈r〉 = piµ(p).uses[i] ∧ i ∈ Nns
undefined : otherwise
138
6.1 Compiler Correctness for Sequential Mixed Machine
• lifo
lifo
piµ
MASM(m, ba, p, n)
def≡ items↓ (m, ba−32 ds32, n)
where ds ≡ (|piµ(p).uses|+ 1) · 4.
Additionally to the retrieval of the stack components from the memory, we introduce a func-
tion evaluating one of the stack layout characteristics on the base of the abstract stack configu-
ration.
Definition 6.7 (Distance between Frame Base Addresses in MASM Stack). Given a non-
empty MASM stack st ∈ frame∗MASM, and a stack frame index i ∈ Ntop(st), we define the func-
tion distMASM(st, i) ∈ N computing the distance (in bytes) between the frame base addresses of
the i-th and i + 1-th frames for the non-topmost i-th frame, or the distance from the the base
address of the topmost frame till the stack pointer as follows:
distMASM(st, i)
def≡
{
4 · (#dom (saved i(st)) + |lifoi(st)|) : i = top(s)
4 · (#dom (saved i(st)) + |lifoi(st)|+ |parsi+1(st)|+ 2) : i < top(s)
6.1.1.2 C-IL Compiler Information and Stack Frames
In comparison to MASM the C-IL stack frames in the memory (see Figure 6.3) have addition-
ally the space for local variables (separate from the parameters), caller-save registers which are
saved by the C-IL compiled code on a nested function call, return destination representing an
address at which the return value must be saved on the function return, and a region for tem-
porary values used by the compiler.
The compiled code of any function call (see Figure 6.4) consists of two parts: a pre-call exe-
cuted before the jump to the callee’s code, and an epilogue finishing the call after returning from
the callee. The generated code of a function contains a starting portion of code (or prologue)
preparing the frame before the function execution. The code is placed into the memory at the
address determined for a given function by the C-IL environment parameters θ.
During the execution of the pre-call, the calling function saves the caller-save registers and
return value destination on the stack, allocates the space for the callee’s parameters and puts the
arguments either on the stack or into registers according to the calling convention. Moreover,
the return code address is saved before the jump to the callee is performed.
In turn, the callee’s prologue is responsible for storing the frame base pointer of the caller,
setting up the base pointer of the new frame in the GPR, allocation and initialization of local
variables, as well as storing the callee-save registers on the stack.
The implementation of the return from the function is similar to the one considered for MASM
above. After the return, the caller executes the epilogue storing the return value into the mem-
ory, restoring the content of caller-save registers, etc. The exact code generation depends on the
compiler and we will not rely on it here. The only crucial fact we will use in the work is which
items of the callee’s frame must be prepared by the caller.
Now, similarly to the MASM compilation, we introduce the static compiler information for
C-IL. However, since we assume an optimizing compiler, the information is more sophisticated.
Definition 6.8 (C-IL Compiler Information). We define the set of tuples representing the C-IL
compiler information as
infoT CIL
def≡ (code, off , offelog , cp, offlvar , reglvar , sizecrreg , offcrreg , sizetmp)
where the components have the following meaning:
139
6 Compiler Correctness and Justification of Concurrent Mixed Model
gpr (bp)
High memory
Low memory
...
pbp
parameters
return address
frame j+1
function f
frame j
 function f '
gpr (sp)
local variables
callee-save
registers
caller-save
registers
return destination
pbp
parameters
return address
local variables
callee-save
registers
temporary region
temporary region
Figure 6.3: Layout of C-IL stack frames.
• code ∈ I∗ASM – a list of MIPS-86 assembly instructions representing a given compiled C-IL
program,
• cp : Fname × N → B – a flag indicating whether the C-IL machine is in the consistency
point right before execution of a statement at a given location in a considered function,
• off : Fname × N ⇀ N0 – the offset in the list code for the first assembly instruction imple-
menting a C-IL statement at a specified location (only a consistency point) in a given C-IL
function,
• offelog : Fname × N ⇀ N0 – the offset in code for the epilogue of a function call in a given
function at a given location1,
1The need of this piece of information not present in the original work [Sha12] was firstly discovered by the author
of this thesis during the research on the correctness of the thread switch. Later, the corresponding component was
added into infoTCIL by Christoph Baumann in his doctoral thesis [Bau14b]
140
6.1 Compiler Correctness for Sequential Mixed Machine
function f function f '
epiloguecall f '
prologue
return
...
...
...
Starting addresses of compiled codes of f and f '
i
ii
iii
ivv
pre-call
Figure 6.4: Execution of the compiled code during the function call and return. The numbers
next to the arrows show the order of the execution.
• offlvar : V×Fname ⇀ N – computes the offset (in bytes) of a local variable (excluding input
parameters) on the stack relative to the frame base pointer for a given function.
In contrast to this function in [Bau14b] additionally relying on being in a consistency point,
we consider this offset to be static after the compilation because the allocation of the local
variables is performed in the prologue part of the callee and cannot be changed during
the function execution. Note that for simplicity we assume that the optimizing compiler
allocates all local variables on the stack though their values may be kept in registers.2
• reglvar : V × Fname × N ⇀ B5 ∪ {⊥} – a function returning (if defined) a general purpose
register in which the optimizing compiler keeps a value of a given local variable/parame-
ter at the consistency point specified by a function name and a location in the body of the
function. If the variable is not in a register, the function returns ⊥.
Note that we do not forbid the compiler to reallocate the registers for local variables (in-
cluding parameters) during function calls. So, for a function call at a location loc in a
function f and a local variable v we do not forbid reglvar (v, f, loc) 6= reglvar (v, f, loc + 1).
Naturally, the variables/parameters whose current values reside on the stack after the
function call, must be stored to the the corresponding stack regions during this call. When
a variable is supposed to be in a caller-save register after the function call, its value will
be restored by the caller’s epilogue from the caller-save region.
• sizecrreg : Fname×N⇀ N0 – specifies the amount of bytes needed for the caller-save region
on the stack in case of a function call in a specified function at a given location.
• offcrreg : Regcaller ×Fname ×N⇀ N0 – the offset in bytes within the caller-save region (rel-
ative to its upper end) where a given register is supposed to be saved during the function
call in a specified function at a given location.
2In fact this is similar to the parameter allocation performed according to the calling convention where the home
addresses are always reserved on the stack (see Section 5.1.3).
141
6 Compiler Correctness and Justification of Concurrent Mixed Model
• sizetmp : Fname ×N⇀ N0 – the size (in bytes ) of the temporary region on the stack before
execution of a statement at a given location in a considered function. Here we rely on the
fact that this size is known by static program analysis. Therefore, the stack must be used
by the compiler in a restricted way.
• offvol : Fname × N ⇀ 2N0 – the offsets (in the compiled code) for assembly instructions
implementing memory volatile accesses in a C-IL statement at a specified location in a
given C-IL function.
Under a volatile access we understand an access of the memory/stack by a pointer value
(computed during expression evaluation and belonging to valptr ∪ val lref ) to a volatile-
qualified type. Since any C-IL statement can contain a few such accesses, the result of the
function is a subset of offsets in the compiled code. We will consider volatile accesses later
in the chapter and will use this compiler information under certain conditions.
In [Bau14b] a similar function computing a single offset was not included into the com-
piler information, however was introduced as an uninterpreted one based on the compiler
information and other parameters. Since the qualifier volatile of a type serves for the com-
piler in order to signal how an access to a memory may be implemented, in this work we
consider this information directly available after the compilation. The idea of labeling pro-
cessor or assembly instructions implementing memory accesses as volatile accesses comes
from [CS10] and we use it in our work.
Definition 6.9 (C-IL Compilation Function). Similarly to MASM, we introduce the uninter-
preted function
cmplCIL : ProgCIL ⇀ infoT CIL
generating the static compiler information for any given CIL program.
Finally, we can introduce an auxiliary function that will be used later in the statement of the
compiler correctness.
Definition 6.10 (Code Address of C-IL Statement). Given the compiler information infocil ∈
infoT CIL and the code base address cbacil, one can easily compute the starting code address of
a C-IL statement at the location loc ∈ N in a function f ∈ Fname as
caCIL(f, loc, infocil, cbacil)
def≡ cbacil +32 (4 · infocil.off (f, loc))32
Definition 6.11 (Return Address in C-IL Function Call). The return address saved on the stack
during a function call is the starting address of the caller’s epilogue and is computed as
rcaCIL(f, loc, infocil, cbacil)
def≡ cbacil +32 (4 · infocil.offelog(f, loc))32
Definition 6.12 (Size of Local Variables and Parameters on C-IL Stack). For a given C-IL
program picil ∈ ProgCIL, the environment parameter θ ∈ ParamsCIL, and a function f ∈
dom (picil.F), the sizes (in bytes) of stack regions required for allocation of parameters and local
variables of the function f are computed by sizepicil,θpars (f) ∈ N0 and sizepicil,θlvars (f) ∈ N0 respectively.
Formally, using the shorthands (vf,j , tf,j) ≡ picil.F(f).V [j], nparf ≡ picil.F(f).npar , and
142
6.1 Compiler Correctness for Sequential Mixed Machine
nlvarf ≡ |picil.F(f).V |we define
sizepicil,θpars (f)
def≡
nparf∑
j=1
sizeθ (qt2t(tf,j))
sizepicil,θlvars (f)
def≡
nlvarf∑
j=nparf +1
sizeθ (qt2t(tf,j))
Definition 6.13 (Distance between Frame Base Addresses in C-IL Stack). Given a C-IL pro-
gram picil ∈ ProgCIL, the environment parameter θ ∈ ParamsCIL, the C-IL compiler information
infocil ∈ infoT CIL, a non-empty C-IL stack st ∈ frame∗CIL, and a stack frame index i ∈ Ntop(st),
by analogy with Definition 6.7 we provide the function
distpicil,θCIL (st, i, infocil)
def≡
{
dist ′ : i = top(st)
dist ′ + dist ′′ : i < top(st)
where the distances dist ′, dist ′′ are computed as
dist ′ ≡ sizepicil,θlvars (fi(st)) + 4 ·#Regcallee + infocil.sizetmp(fi(st), loci(st))
dist ′′ ≡ infocil.sizecrreg (fi(st), loci(st)− 1) + 4 + sizepicil,θpars (fi+1(st)) + 4 · 2
Note that for the non-topmost frames i the size of the temporary region is computed at the
the location after the function call, what, in turn, requires a consistency point to be at this location.
Obviously, the size of this region during all callee executions must be equal to the one after the
return. We do not require this size to be the same before and after the function call.
Moreover, similarly to [Bau14b] for simplicity we assume that C-IL functions are compiled in
a way such that in C-IL the callee’s prologue stores all callee-save registers on the stack. Otherwise,
values of local variables for lower stack frames could be kept in registers during further function
calls and saved by one of the callees in a much higher frame on the stack if the callee would
modify the correspondent registers.
Definition 6.14 (Local Variable Address on C-IL Stack). Given a base address ba ∈ B32 for a
frame of a C-IL function f ∈ Fname and the C-IL compiler information infocil, we define the
starting (or base) address of a local variable (excluding parameters) v ∈ V on the stack as
lva(v, ba, f, infocil)
def≡ ba−32 (infocil.offlvar (v, f))32
Definition 6.15 (Function Parameter Address on C-IL Stack). For ba ∈ B32, f ∈ Fname, an
index of the function parameter j ∈ [1 : picil.F(f).npar ], and (vf,j , tf,j) ≡ picil.F(f).V [j] we
define
parapicil,θ(j, ba, f)
def≡ ba+32
(
2 · 4 +
j−1∑
k=1
sizeθ (qt2t(tf,k))
)
32
Definition 6.16 (Address of a Caller-Save Register on C-IL Stack). Let loc ∈ N be a location of
a function call in a C-IL function f ∈ Fname. Then the memory address at which a content of a
caller-save register r ∈ Regcaller is saved on the stack during this function call is computed as
crrapicil,θ(r, ba, f, loc, infocil)
def≡ bin32 (crrba − infocil.offcrreg(r, f, loc))
143
6 Compiler Correctness and Justification of Concurrent Mixed Model
...
pbp
parameters
C-IL 
frame 
local variables
calle-save registers
temporary region
ra
...
ba
return destination
... caller-save registers
crra
crrba register r
Figure 6.5: Address of a caller-save register on C-IL Stack.
where crrba is the address (as a number) of the upper bound of the caller-save region (see
Figure 6.5) such that
crrba ≡ 〈ba〉 − sizepicil,θlvars (f)− 4 ·#Regcallee − infocil.sizetmp(f, loc + 1)− 1
Note that this computation is used only for non-topmost frames. Moreover, as we mentioned
before, the size of the temporary region is determined at the location after the function call.
Definition 6.17 (Computation of C-IL Stack Frame Components from Memory). Additionally,
given an ISA memory m : B32 → B8, we can retrieve the following components from the C-IL
stack in the memory:
• Destination address (binary word) for the return value in case of a call from C-IL.
Using a base address ba′ of a frame for a function/procedure f ′ as depicted on Figure 6.6,
we get the destination address stored in the previous C-IL frame as
rdswpi,θ(m, ba′, f ′)
def≡{
item
(
m, ba′, 2 + sizepicil,θpars (f
′)/4
)
: f ′ ∈ dom (Fθpicil) ∧ ¬ext(f ′, picil, θ)
item (m, ba, 2 + piµ(f
′).npar) : f ′ ∈ dom (piµ) ∧ ¬ext(f ′, piµ)
where the number 2 corresponds to ra and pbp on the stack.
• Callee-saved registers stored in a C-IL frame:
gprcallee
picil,θ(m, ba, f) : Regcallee → B32
such that for each svi ∈ Regcallee
gprcallee
picil,θ(m, ba, f)(svi)
def≡ items↓
(
m, ba−32
(
sizepicil,θlvars (f) + 4
)
32
,#Regcallee
)
[i]
Recall that for C-IL we require that all callee-save registers are saved on the stack, whereas
for MASM – only if the MASM procedure is supposed to be called from C-IL.
144
6.1 Compiler Correctness for Sequential Mixed Machine
...
pbp
parameters
C-IL frame 
for f
local variables
calle-save registers
caller-save registers
temporary region
ra
rdsw
parameters
pbp
ra
...
C-IL/MASM 
frame for f '
ba
ba '
Figure 6.6: Destination address for the return value.
6.1.1.3 Mixed Compiler Information and Stack
Definition 6.18 (MX Compiler Information). Now, we combine the compile information for
C-IL and MASM into a tuple
infoT MX
def≡ infoT CIL × infoT MASM
In this chapter we will often use info = (infocil, infoµ) ∈ infoT MX.
Definition 6.19 (Code Region of MX Machine). The code region is determined by a set of
byte addresses at which the compiled code resides in the memory. Given cba = (cbacil, cbaµ) ∈
B32 × B32 and info we define
AcodeCIL (infocil, cbacil)
def≡ {cbacil}4·|infocil.code|
AcodeMASM(infoµ, cbaµ)
def≡ {cbaµ}4·|infoµ.code|
AcodeMX (info, cba)
def≡ AcodeCIL (infocil, cbacil) ∪AcodeMASM(infoµ, cbaµ)
Obviously we require that both code regions do not overlap:
validcodeMX (info, cba)
def≡ AcodeCIL (infocil, cbacil) ∩AcodeMASM(infoµ, cbaµ) = ∅
Note that the code base address is a byte address of the first byte of the compiled code starting
in the lower memory. In contrast, the stack base address is an highest byte address because the
stack grows down from the higher to the lower memory.
Before we provide the computation of the distance between base pointers of the frames in the
overall stack, we introduce auxiliary definitions and a unified notation.
145
6 Compiler Correctness and Justification of Concurrent Mixed Model
Definition 6.20 (Full Stack of Mixed Machine). The full stack of the mixed machine (or MX
stack) in a configuration c ∈ CMX is a sequence of stacks of all contexts in c. In order to extract
such a stack from the given MX configuration we use the following shorthands and definitions:
• the stack of a context k ∈ contextCIL ∪ contextactiveMASM ∪ context inactiveMASM
st(k)
def≡
{
k.stack : masm(k)
k : otherwise
• the full stack of an inactive context k ∈ (contextCIL ∪ context inactiveMASM )∗ is
stic(k) ∈ (frameMASM ∪ frameCIL)∗
stic(k)
def≡
{
ε : k = ε
stic(k[1 : ni− 1]) ◦ st(k[ni]) : otherwise
where ni ≡ |k|
• the stack of the MX configuration
stMX(c)
def≡ stic(c.ic) ◦ st(c.ac)
To keep formal definitions in the similar shape throughout this work, we apply now the same
overloaded shorthands for components of an MX stack st ∈ (frameMASM ∪ frameCIL)∗ as it was
done for MASM and C-IL:
• the index of the topmost frame: top(st) ≡ |st|
• any components X ∈ {f, rds, loc,ME , p, pars, saved , lifo} of a stack frame with an index
i ∈ Ntop(st):
Xi(st) ≡ st[i].X
• the components of the topmost frame
Xtop(st) ≡ Xtop(st)(st)
• predicates checking whether a frame with an index i in the MX stack is of the C-IL or
MASM type:
cil(st, i)
def≡ (st[i] ∈ frameCIL)
masm(st, i)
def≡ (st[i] ∈ frameMASM)
• the components Y ∈ {rettype,npar , V, P, body , uses} of function/procedure table entry for
a function/procedure in the i-th frame:
Yi(st, pi) ≡
{
picil.F(fi(st)).Y : cil(st, i)
piµ(pi(st)).Y : masm(st, i)
146
6.1 Compiler Correctness for Sequential Mixed Machine
...
pbp
parameters
return address
MASM 
frame i+1
C-IL 
frame i
distCIL-MASM
local variables
callee-save
registers
caller-save
registers
return destination
pbp
parameters
return address
saved
register area
lifo
temporary region
...
MASM 
frame i
pbp
parameters
return address
saved
register area
lifo
...
pbp
parameters
return address
local variables
callee-save
registers
caller-save
registers
return destination
temporary region
...
C-IL 
frame i+1
distMASM-CIL
(a) (b)
Figure 6.7: Computation of distances (a) distCIL-MASM and (b) dist
picil,θ
CIL in MX stack.
Definition 6.21 (Distance between Frame Base Addresses in MX Stack). Given an MX pro-
gram pi ∈ ProgMX, pi = (piµ, picil), the environment parameter θ ∈ ParamsCIL, the mixed
machine compiler information info ∈ infoT MX, info = (infocil, infoµ), a non-empty MX stack
st ∈ (frameMASM ∪ frameCIL)∗, and a stack frame index i ∈ Ntop(st), we define the function
distpi,θMX(st, i, info)
def≡

distpicil,θCIL (st[i], 1, infocil) : i = top(st) ∧ cil(st, i)
distMASM(st[i], 1) : i = top(st) ∧masm(st, i)
distpicil,θCIL (st[i : i+ 1], 1, infocil) : i < top(st) ∧ cil(st, i) ∧ cil(st, i+ 1)
distMASM(st[i : i+ 1], 1) : i < top(st) ∧masm(st, i) ∧masm(st, i+ 1)
distCIL-MASM : i < top(st) ∧ cil(st, i) ∧masm(st, i+ 1)
distMASM-CIL : i < top(st) ∧masm(st, i) ∧ cil(st, i+ 1)
where the distances distCIL-MASM and distMASM-CIL between base addresses of two adjacent
147
6 Compiler Correctness and Justification of Concurrent Mixed Model
frames of different types are computed as depicted on Figure 6.7:
distCIL-MASM ≡ distpicil,θCIL (st[i], 1, infocil)+
infocil.sizecrreg (fi(st), loci(st)− 1) +
4 + 4 · (npar i+1(st, pi) + 2)
distMASM-CIL ≡ distMASM(st[i], 1) + sizepicil,θpars (fi+1(st)) + 4 · 2
Finally, we can define the computation of base addresses of each frame in the mixed stack.
Definition 6.22 (Frame Base Addresses of MX Stack). Given the stack base address sba ∈ B32
and the arguments as in Definition 6.21, we compute the base address of the MX stack frame
with an index i ∈ Ntop(st) with the help of the following functions:
b̂ase
pi,θ
MX(st, i, sba, info)
def≡

〈sba〉 − sizepicil,θpars (fi(st))− 4 · 2 + 1 : i = 1 ∧ cil(st, i)
〈sba〉 − 4 · |parsi(st)| − 4 · 2 + 1 : i = 1 ∧masm(st, i)
b̂ase
pi,θ
MX(st, i− 1, sba, info) −
distpi,θMX(st, i− 1, info)
: i ∈ [2 : top(st)]
basepi,θMX(st, i, sba, info)
def≡ bin32
(
b̂ase
pi,θ
MX(st, i, sba, info)
)
where 1 is present in the computation for topmost frames because sba points to the last byte of
the first word on the stack.
Note that we consider the stack base address also as a parameter of the system under which
the compiled code is executed. Initially, the stack base address as well as the stack and base
pointers are set during the booting and can be then changed when the software multi-threading
is supported. Hence, we do not fix these parameters and state the compiler correctness for any
of them. Later, we will show how these parameters are changed when we introduce the stack
substitution.
Definition 6.23 (Stack Region of MX Machine). So, for given stack base address sba ∈ B32 and
maximal stack size mss ∈ Nwe define the stack region as a set of byte addresses at which every
byte of the stack resides in the memory.
First, we compute the maximal value of the stack pointer that does not cause the stack over-
flow:
spmax(sba,mss)
def≡ sba−32 (mss− 1)32
Then, using the existing notation we define the stack region
AstackMX (sba,mss)
def≡ {spmax(sba,mss)}mss
Moreover, from the stack st and the stack base address we can compute which address cor-
responds to the the item on the top of the stack. In fact, this address will be coupled with the
stack pointer in GPRs later on.
sppi,θtop(st, info, sba)
def≡ basepi,θMX(st, top(st), sba, info)−32
(
distpi,θMX(st, top(st), info)
)
32
Definition 6.24 (Stack Overflow). Obviously, the stack overflow can be easily indicated by the
predicate
stackovf pi,θMX(st, info, sba,mss)
def≡ sppi,θtop(st, info, sba) < spmax(sba,mss)
148
6.1 Compiler Correctness for Sequential Mixed Machine
6.1.2 Sequential Mixed Compiler Correctness
6.1.2.1 Compiler Consistency Relation
In order to guarantee that the SB reduced MIPS-86 machine implements the abstract MX seman-
tics, we introduce the compiler consistency relation coupling configurations of both machines
wrt. the static compiler information. Its definition is split into sub-relations for the program,
stack, registers, etc. Obviously, we will require the compiler consistency to hold only at consis-
tency points defined a bit later in this section.
In the definitions below we consider the following configurations and parameters: c ∈ CMX,
pi = (piµ, picil) ∈ ProgMX, θ ∈ ParamsCIL, d ∈ CMIPS, info = (infocil, infoµ) ∈ infoT MX, sba ∈
B32, mss ∈ N, cba = (cbacil, cbaµ) ∈ B32 × B32. Moreover, when the we are not interested in the
full MIPS-86 configuration, we will directly use its memory component m.
Definition 6.25 (MX Code Consistency). The MX compiler consistency includes the code con-
sistency for the MASM and C-IL machines and for each of them requires that (i) the assembly
code in the corresponding compiler information is obtained via the compilation of the given
program, and (ii) each assembled instruction of this code has its binary representation residing
in the MIPS-86 memory.
Formally, we define the relations for piµ and picil separately as
consiscodeMASM(m,piµ, infoµ, cbaµ)
def≡
(i) infoµ = cmplMASM(piµ)
(ii) ∀j ∈ N|infoµ.code|. codeASM
(
infoµ.code[j]
)
= m4(ad
j
µ)
with adjµ ≡ cbaµ +32 (4 · (j − 1))32
consiscodeCIL (m,picil, infocil, cbacil)
def≡
(i) infocil = cmplCIL(picil)
(ii) ∀j ∈ N|infocil.code|. codeASM (infocil.code[j]) = m4(adjcil)
with adjcil ≡ cbacil +32 (4 · (j − 1))32
The full MX code consistency combines them into
consiscodeMX (m,pi, info, cba)
def≡
(i) consiscodeMASM(m,piµ, infoµ, cbaµ)
(ii) consiscodeCIL (m,picil, infocil, cbacil)
Moreover, for the given full MIPS-86 configuration we overload the definition:
consiscodeMX (d, pi, info, cba)
def≡ consiscodeMX (d.m, pi, info, cba)
Definition 6.26 (MX Memory Consistency). The MX memory consistency states that the mem-
ory content of both machines is equal except the code and stack regions, as well as some ad-
dresses icm ⊂ Ahyp at which the memory consistency might not hold because of environment
steps.
consismemMX (c, d, info, cba, sba,mss, icm)
def≡
∀a ∈ B32 \ (AcodeMX (info, cba) ∪AstackMX (sba,mss) ∪ icm) . d.m(a) = c.M(a)
149
6 Compiler Correctness and Justification of Concurrent Mixed Model
...
frames of k 
inactive contexts 
1st frame
...
...
(j-1)-th frame
j-th frame
MASM frames 
of c.ic[k] 
... C-IL frames 
of c.ic[k+1] or c.ac
Figure 6.8: The position of the stack frame j containing the values of c.ic[k].gprcallee .
Now, using the shorthands st ≡ stMX(c) and baq ≡ basepi,θMX(st, q, sba, info) for any q ∈ Ntop(st)
we define the sub-relation based on the stack configuration.
Definition 6.27 (Frame Base and Stack Pointer Consistency). By the frame base and stack
pointer consistency we require that the value of the frame base pointer in the GPR is equal to
the base address of the topmost frame of the MX stack. The stack pointer, in turn, must point to
the item on the top of the stack.
consis
bp/sp
MX (c, d, pi, θ, info, sba)
def≡
(i) d.cpu.core.gpr(bp) = batop(st)
(ii) d.cpu.core.gpr(sp) = sppi,θtop(st, info, sba)
The content of all other registers except the program counter is covered by the registers con-
sistency.
Definition 6.28 (MX Register Consistency). The register consistency demands that (i) GPRs of
active MASM context are coupled with the corresponding registers of the core configuration,
(ii) the content of all SPRs in both machines are always equal, and (iii) the callee-save registers
of inactive MASM contexts in the MX machine configuration are stored in the MIPS-86 machine
memory on the stack corresponding to the next context.3
consisregMX(c, d, pi, θ, info, sba)
def≡
(i) masm(c.ac) =⇒ ∀r ∈ B5 \ {sp, bp, ra}. d.cpu.core.gpr(r) = c.ac.gpr(r)
(ii) d.cpu.core.spr = c.spr
(iii) ∀k ∈ N|c.ic|. masm(c.ic[k]) =⇒ c.ic[k].gprcallee = gprcalleepicil,θ(d.m, baj , fj(st))
where j is the index of the first stack frame after the context c.ic[k] on the stack st such that
j ≡ 1 +∑ki=1 nfcntx (c.ic[i]) (see Figure 6.8).
Definition 6.29 (MX Control Consistency). The control consistency demands that (i) the pro-
gram counter of the processor core points to the code address of the current C-IL statement or
3This part of the MX consistency was missing in the original work [Sha12].
150
6.1 Compiler Correctness for Sequential Mixed Machine
MASM instruction depending on the MX context, and (ii) each code return addresses saved on
the stack is either the code addresse of the next MASM statement after the procedure/function
call, or the starting address of the epilogue in the C-IL function call.
consisctrlMX(c, d, pi, θ, info, cba, sba)
def≡
(i) d.cpu.core.pc =
{
caMASM
(
ptop(st), loctop(st), infoµ, cbaµ
)
: masm(st, top(st))
caCIL (ftop(st), loctop(st), infocil, cbacil) : cil(st, top(st))
(ii) ∀i ∈ [2 : top(st)].
ra(m, bai) =
{
caMASM
(
pi−1(st), loci−1(st), infoµ, cbaµ
)
: masm(st, i)
rcaCIL(fi−1(st), loci−1(st)− 1, infocil, cbacil) : cil(st, i)
The rest of the abstract MX stack configuration is covered by the stack compiler consistency
guaranteeing that the stack is properly implemented in the MIPS-86 memory.
Definition 6.30 (Stack Consistency). Before we define the stack consistency, we introduce the
consistency relations for the return destination and local memory of a single frame.
For a non-topmost frame i the return destination consistency requires that the return value desti-
nation address on the stack is either a global memory address, or an address of a corresponding
local variable / parameter.4
consisrdsiMX (c, d, pi, θ, info, sba)
def≡ rdsi(st) 6= ⊥ =⇒
rdswpi,θ(d.m, bai+1, Fi+1) =
a : rdsi(st) = val(a, t)
lva(v, bai, fi(st), infocil) +32 o32 : rdsi(st) = lref((v, o), i, t
′)∧
(v, t′′) = Vi(st, pi)[j] ∧ j > npar i(st, pi)
parapicil,θ(j, bai, fi(st)) +32 o32 : rdsi(st) = lref((v, o), i, t
′)
(v, t′′) = Vi(st, pi)[j] ∧ j ≤ npar i(st, pi)
where
Fi+1 ≡
{
pi+1(st) : masm(st, i+ 1)
fi+1(st) : cil(st, i+ 1)
Note that we do not make restrictions on t′ and t′′ because the configuration c is well-formed.
In the simplest case one has t′ = ptr(t′′).
The consistency relation for the local memory of a frame i shows where the values of its local
variables and parameters reside in the MIPS-86 configuration. Particularly, a local variable or
a parameter of a topmost frame can be either in a processor register or on the physical stack
where it is originally allocated. As for non-topmost frames, if the value is supposed to be in
the register after the return from a callee, it must be kept in a corresponding region for calle-
or caller-save registers on the stack during the callee’s execution. Otherwise, as in the previous
case, it resides on the stack at the allocated address.
Let the following shorthands denote
• a local variable/parameter declared in the body of the function fi(st)
(vi,j , ti,j) ≡ Vi(st, pi)[j]
4The fact that the return value can be stored into a variable used as a parameters in the function was not taken into
account in [Bau14b].
151
6 Compiler Correctness and Justification of Concurrent Mixed Model
• a register (if defined) containing its value either at the moment (for the topmost frame) or
after the return from the callee fi+1(st) (for non-topmost frames)
ri,j ≡ infocil.reglvar (vi,j , fi(st), loci(st))
• a memory address at which during a function/procedure call the content of ri,j is saved
on the stack if the register is a caller-save one
crrai,j ≡ crrapicil,θ (ri,j , bai, fi(st), loci(st)− 1, infocil)
Then, this consistency relation is defined as
consis lmiMX(c, d, pi, θ, info, sba)
def≡ ∀j ∈ N|Vi(st,pi)|.
ME i(st) (vi,j) =
d.cpu.core.gpr(ri,j) : i = top(st) ∧ ri,j 6= ⊥
gprcallee
picil,θ (d.m, bai+1, fi+1(st)) (ri,j) : i < top(st) ∧ ri,j ∈ Regcallee ∧
cil(st, i+ 1)
saved
piµ
MASM (d.m, bai+1, pi+1(st)) (ri,j) : i < top(st) ∧ ri,j ∈ Regcallee ∧
masm(st, i+ 1)
d.m4 (crrai,j) : i < top(st) ∧ ri,j ∈ Regcaller
d.msizeθ(qt2t(ti,j))
(
parapicil,θ(j, bai, fi(st))
)
: ri,j = ⊥ ∧ j ≤ npar i(st, pi)
d.msizeθ(qt2t(ti,j)) (lva(vi,j , bai, fi(st), infocil)) : ri,j = ⊥ ∧ j > npar i(st, pi)
Parameters passed through the registers must be only single data words. Moreover, for sim-
plicity we assume that variables and parameters are not supposed to reside in the register zero
after the function/procedure return.
Finally, we can state the stack consistency demanding the following:
consisstackMX (c, d, pi, θ, info, sba)
def≡ ∀i ∈ Ntop(st).
(i) i > 1 =⇒ pbp(d.m, bai) = bai−1
(ii) masm(st, i) =⇒
(a) parsi(st) = pars
piµ
MASM (d.m, bai, pi(st))
(b) saved i(st) = saved
piµ
MASM (d.m, bai, pi(st))
(c) lifoi(st) = lifo
piµ
MASM (d.m, bai, pi(st), |lifoi(st)|)
(iii) cil(st, i) =⇒
(a) i < top(st) =⇒ consisrdsiMX (c, d, pi, θ, info, sba)
(b) consis lmiMX(c, d, pi, θ, info, sba)
Recall that we already stated the requirement that the prologue of any C-IL function stores
all callee-save registers on the stack.
Definition 6.31 (Software Condition on MASM Procedures). As follows from the definition
of the consistency relation for the local variables, we also assume that any MASM procedure
called from a C-IL function is supposed to store the callee-save registers in the same manner.
We formulate this requirement explicitly as a software condition:
scprogMX (pi, θ)
def≡ ∀p ∈ dom (piµ) . ¬ext(p, piµ) ∧ p ∈ dom
(Fθpicil) ∧ ext(p, picil, θ) =⇒
∀r ∈ Regcallee . 〈r〉 ∈ piµ(p).uses
152
6.1 Compiler Correctness for Sequential Mixed Machine
Definition 6.32 (MX Compiler Consistency Relation). The overall compiler consistency for the
mixed machine is then a conjunction of all sub-relations defined above:
consisMX(c, d, pi, θ, info, cba, sba,mss, icm)
def≡
(i) consiscodeMX (d, pi, info, cba)
(ii) consisctrlMX(c, d, pi, θ, info, cba, sba)
(iii) consisbp/spMX (c, d, pi, θ, info, sba)
(iv) consisregMX(c, d, pi, θ, info, sba)
(v) consisstackMX (c, d, pi, θ, info, sba)
(vi) consismemMX (c, d, info, cba, sba,mss, icm)
6.1.2.2 Consistency Points
Definition 6.33 (Consistency Points for MX Machine). The mixed machine is at a consis-
tency point if it is going to execute either a MASM instruction or a C-IL statement at a lo-
cation determined by the C-IL compiler as the consistency point. Given an active context
ac ∈ contextCIL ∪ contextactiveMASM and the compiler information info ∈ infoT MX, we formally
define
cpMX(ac, info)
def≡ masm(ac)∨
cil(ac) ∧ infocil.cp (ftop(ac), loctop(ac))
Note that since the C-IL active context is represented by the C-IL stack, we directly use ac
instead of st(ac). We also reload the predicate for a given mixed machine configuration c ∈ CMX
as
cpMX(c, info) ≡ cpMX(c.ac, info)
From the discussion about the C-IL compiler information and the MX consistency relation we
already know that we have to require that the C-IL compiler guarantees consistency points at
locations before and after function/procedure calls. This requirement is important for the MX
compiler correctness and can be extended depending on the context in which our MX semantics
is used. So, in order to be able to argue about the correctness of the kernel thread switch in
the scope of this work, we will also require that the consistency points are present at the first
statement of C-IL functions. This condition is weaker in comparison to [Bau14b] where the
author additionally demanded them at the return statement.
Definition 6.34 (Requirement on Consistency Points in C-IL Code). The C-IL compiler should
at least guarantee consistency points before and after function/procedure calls as well as at the
first statements in function bodies. Let bodyf ≡ Fθpicil(f).P and stmtf,loc ≡ bodyf [loc] then
validcpMX(pi, info, θ)
def≡ ∀f ∈ dom (Fθpicil) , loc ∈ [1 : |bodyf |].
¬ext(f, picil, θ) =⇒
(i) stmtf,loc ∈ {e0 = call e(E), call e(E)} =⇒
infocil.cp(f, loc) ∧ infocil.cp(f, loc + 1)
(ii) infocil.cp(f, 1)
153
6 Compiler Correctness and Justification of Concurrent Mixed Model
A set of ISA instruction addresses of the first instructions in the compiled code of C-IL state-
ments at consistency points is computed in the following way:
AcpCIL(picil, infocil, θ, cbacil)
def≡
caCIL(f, loc, infocil, θ, cbacil)
∣∣∣∣∣∣∣∣∣
f ∈ dom (Fθpicil) ∧
¬ext(f, picil, θ)∧
loc ∈ [1 : |Fθpicil(f).P |]∧
infocil.cp(f, loc)

Analogously, such a set of addresses for a MASM program is defined as
AcpMASM(piµ, infoµ, cbaµ)
def≡
caMASM(p, loc, infoµ, cbaµ)
∣∣∣∣∣∣∣
p ∈ dom (piµ) ∧
¬ext(p, piµ)∧
loc ∈ [1 : |piµ(p).body |]

Then, for an MX program pi, cba = (cbacil, cbaµ) ∈ B32 × B32, and the compiler information
info = (infocil, infoµ) ∈ infoT MX we combine both sets into
AcpMX(pi, info, θ, cba)
def≡ AcpMASM(piµ, infoµ, cbaµ) ∪AcpCIL(picil, infocil, θ, cbacil)
Definition 6.35 (Consistency Points for MIPS-86 ISA). Therefore, the SB reduced MIPS-86
machine with a single core core ∈ Ccore is at the MX consistency point if the following predicate
holds:
cpMXrMIPS(core, pi, info, θ, cba)
def≡ core.pc ∈ AcpMX(pi, info, θ, cba)
We also reuse the definition for a given configuration d ∈ CMIPS
cpMXrMIPS(d, pi, info, θ, cba) ≡ cpMXrMIPS(d.cpu.core, pi, info, θ, cba)
6.1.2.3 Accessed Addresses
Sets of Addresses for MASM In Section 4.3.1 we have already defined the sets of accessed
addresses for the MIPS-86 model. However, since they rely on the MIPS-86 processor input,
here we will not use them directly. Instead, we will reproduce only the needed parts of them,
what allows to make the definitions clear and show the exact computations.
Definition 6.36 (Memory Addresses Accessed for Reading and Writing by MASM Instruc-
tion). The set of memory byte addresses at which the memory is read during a MASM instruc-
tion execution is
readsMASM(cµ, piµ)
def≡
{
{ea(core, Iµ)}4 : instrnext(cµ, piµ) ∈ IMASMASM ∧ load(Iµ)
∅ : otherwise
where core ≡ coreMIPS(cµ) is the processor core constructed from the MASM configuration
and Iµ ∈ B32 is an assembled instruction such that Iµ ≡ I(cµ, piµ). Analogously, for the set of
addresses at which the memory is written we define
swapMASM(cµ, piµ)
def≡ cas(Iµ) ∧ c.gpr(rd(Iµ)) = c.M4(ea(core, Iµ))
wrMASM(cµ, piµ)
def≡ sw(Iµ) ∨ locksw(Iµ) ∨ swapMASM(cµ, piµ)
writesMASM(cµ, piµ)
def≡
{
{ea(core, Iµ)}4 : instrnext(cµ, piµ) ∈ IMASMASM ∧ wrMASM(cµ, piµ)
∅ : otherwise
154
6.1 Compiler Correctness for Sequential Mixed Machine
Sets of Addresses for C-IL First, analogously to [Bau14b] we define the computation of the
set of memory byte addresses involved in the expression evaluation.
Definition 6.37 (Global Memory Footprint for Regular Pointer/Array Values). Given a regular
pointer or array value val(a, t) ∈ valptr such that t ∈ {ptr(t′),array(t′, n)}, we define a set of
accessed byte addresses (or footprint) for reading or writing a value pointed by val(a, t) in the
C-IL global memory as
fpθ(a, t)
def≡
{
{a}sizeθ(t) : ¬isarray(t′)
∅ : otherwise
Definition 6.38 (Global Memory Footprint for Expression Evaluation). The function Apicil,θccil :
E → 2B32 computes a subset of byte addresses accessed during the evaluation of a given ex-
pression wrt. a C-IL configuration ccil ∈ CCIL, a program picil ∈ ProgCIL, and the environment
parameters θ in the following way:
• constant: x ∈ val
Apicil,θccil (x)
def≡ ∅
• variable names: v ∈ V
Apicil,θccil (v)
def≡
{
fpθ(a, t) : [[&(v)]]
picil,θ
ccil
= val(a, t)
∅ : otherwise
Note that [[&(v)]]picil,θccil for any global variable v yields only the value val(a, t) with t =
ptr(t′) for some t′ and we do not include this condition on the type in the definition. In
all other cases the evaluation of the address is either undefined or gives a local reference.
• function names: fn ∈ Fname
Apicil,θccil (fn)
def≡ ∅
• unary operator: e ∈ E, ∈ O1
Apicil,θccil (e)
def≡ Apicil,θccil (e)
• binary operator: e1, e2 ∈ E,⊗ ∈ O2
Apicil,θccil (e1 ⊗ e2)
def≡ Apicil,θccil (e1) ∪Apicil,θccil (e2)
Note that in the definition we do not rely on the order in which the sub-expressions are
evaluated.
• ternary operator: e, e1, e2 ∈ E
Apicil,θccil (e ? e1 : e2)
def≡
{
Apicil,θccil (e) ∪Apicil,θccil (e1) : ¬zeroθ
(
[[e]]pi,θc
)
Apicil,θccil (e) ∪Apicil,θccil (e2) : otherwise
• type cast: t ∈ TQ, e ∈ E
Apicil,θccil ((t)e)
def≡ Apicil,θccil (e)
155
6 Compiler Correctness and Justification of Concurrent Mixed Model
• pointer dereferencing: e ∈ E
Apicil,θccil (∗(e))
def≡

Apicil,θccil (e) ∪ fpθ(a, t) : [[e]]picil,θccil = val(a, t)∧
(isptr(t) ∨ isarray(t))
Apicil,θccil (e) : otherwise
• address of: e ∈ E
Apicil,θccil (&(e))
def≡

Apicil,θccil (e
′) : e = ∗(e′)
Apicil,θccil (&(e
′)) : e = (e′).f
∅ : otherwise
• field access: e ∈ E, f ∈ F
Apicil,θccil ((e).f)
def≡ Apicil,θccil (∗(&((e).f)))
• size of a type or an expression: x ∈ TQ ∪ E
Apicil,θccil (sizeof(x))
def≡ ∅
Definition 6.39 (Memory Addresses Accessed for Reading and Writing during C-IL State-
ment Execution). Let the statement to be executed in a C-IL configuration ccil ∈ CCIL be
s ≡ stmtnext(ccil, picil). Then we define
cascallpicil,θ(ccil) ≡ (s = call e(Ecas)) ∧ isfunc([[e]]picil,θccil , cas, θ)
readsCIL(ccil, picil, θ)
def≡

Acas : cascall
picil,θ(ccil)
Apicil,θccil (e) ∪AE : s = call e(E)∧
isfunc([[e]]picil,θccil , f, θ) ∧ f 6= cas
Apicil,θccil (e) ∪AE ∪Apicil,θccil (&(e0)) : s = (e0 = call e(E))
Apicil,θccil (&(e0)) ∪Apicil,θccil (e1) : s = (e0 = e1)
Apicil,θccil (e) : s ∈ {ifnot e goto l, return e}
∅ : otherwise
where
Ecas ≡ edest ◦ ecmp ◦ eexch ◦ eret
Acas ≡ Apicil,θccil (∗(edest)) ∪Apicil,θccil (ecmp) ∪Apicil,θccil (eexch) ∪Apicil,θccil (eret)
AE ≡
⋃
e′∈E
Apicil,θccil (e
′)
Note that in comparison to the definition of the reads-set in [Bau14b], our version here is
simpler and more intuitive in case of the assignment. It simply takes into account the definition
of the C-IL transition function where we always compute the address of the left-hand side e0.
Moreover, we read the memory for the computation of the address during the function call with
the return value.
156
6.1 Compiler Correctness for Sequential Mixed Machine
For a configuration ccil ∈ Ccil in which a call of the intrinsic function cas is going to be
performed, i.e., cascallpicil,θ(ccil) holds, we define the following predicates:
swappicil,θ(ccil)
def≡ read (θ, ccil, [[edest ]]picil,θccil ) = [[ecmp ]]picil,θccil
Then, the set of addresses written during the call of cas depends on the condition swappicil,θ(ccil)
and whether the global memory or the local memory on the stack is written. In case of a write
on the stack the addresses are not included into the set. Formally, we compute
writescasCIL(ccil, picil, θ)
def≡

fpθ(aret ,ptr(i32)) : ¬swappicil,θ(ccil)∧
[[eret ]]
picil,θ
ccil
= val(aret ,ptr(i32))
fpθ(aret ,ptr(i32)) : swap
picil,θ(ccil)∧
[[eret ]]
picil,θ
ccil
= val(aret ,ptr(i32))∧
[[edest ]]
picil,θ
ccil
∈ val lref
fpθ(adest ,ptr(i32)) : swap
picil,θ(ccil) ∧ [[eret ]]picil,θccil ∈ val lref ∧
[[edest ]]
picil,θ
ccil
= val(adest ,ptr(i32))
fpθ(aret ,ptr(i32))∪ : swappicil,θ(ccil)∧
fpθ(adest ,ptr(i32)) [[eret ]]
picil,θ
ccil
= val(aret ,ptr(i32))∧
[[edest ]]
picil,θ
ccil
= val(adest ,ptr(i32))
∅ : otherwise
In turn, the set of written addresses during a C-IL step is obtained as
writesCIL(ccil, picil, θ)
def≡

fpθ(a, t) : s = (e0 = e1) ∧ [[&(e0)]]picil,θccil = val(a, t)∧
(isptr(t) ∨ isarray(t))
fpθ(a, t) : s = return e ∧ top(ccil) > 1∧
rdstop(ccil)−1(ccil) = val(a, t)
writescasCIL(ccil, picil, θ) : cascall
picil,θ(ccil)
∅ : otherwise
Sets of Addresses for MX Finally, we define the same sets for the whole mixed machine.
Definition 6.40 (Memory Addresses Accessed for Reading and Writing during MX Machine
Step). Let the shorthands be
cµ ≡ confMASM(c) ccil ≡ confCIL(c)
instrµ ≡ instrnext(cµ, piµ) stmtcil ≡ stmtnext(ccil, picil)
Then, we define
readsMX(c, pi, θ)
def≡
{
readsMASM (cµ, piµ) : masm(c.ac)
readsCIL(ccil, picil, θ) : cil(c.ac)
For the configuration c such that masm(c.ac) holds we first introduce a predicate indicating
a return from a MASM procedure to a C-IL function:
retCILMASM(c, pi)
def≡ nfac(c) = 1 ∧ c.ic 6= ε ∧ (instrµ = ret)
157
6 Compiler Correctness and Justification of Concurrent Mixed Model
Analogously, for the return from C-IL to MASM we have
retMASMCIL (c, pi)
def≡ nfac(c) = 1 ∧ c.ic 6= ε ∧ (stmtcil = return e)
Finally, we compute
writesMX(c, pi, θ)
def≡

writesMASM(cµ, piµ) : masm(c.ac) ∧ ¬retCILMASM(c, pi)
fpθ(a, t) : masm(c.ac) ∧ retCILMASM(c, pi)∧
rdstop (stic(c.ic)) = val(a, t)
writesCIL(ccil, picil, θ) : cil(c.ac) ∧ ¬retMASMCIL (c, pi)
∅ : otherwise
Definition 6.41 (No Access to icm by MX Machine). All addresses accessed during a MX ma-
chine step are combined in
accadMX(c, pi, θ)
def≡ readsMX(c, pi, θ) ∪ writesMX(c, pi, θ)
Again, in order to show that the MX machine does not access a region icm, we define
noaccpi,θMX(c, icm)
def≡ accadMX(c, pi, θ) ∩ icm = ∅
6.1.2.4 Volatile Accesses, IO- and OT -Points
As we already know C-IL supports the concept of volatile-qualified types. Any access to a
variable/field (or simply volatile object) of such a type is treated specifically by the compiler
but does not influence the abstract semantics. Usually compilers are recommended to refrain
from optimization of such accesses. This means, for example, a read of a volatile variable is
always performed and the compiler cannot rely on the most recently read value because it can
be changed by the environment. Moreover, the compiler can reorder the accesses to non-volatile
variables, but the sequence of volatile accesses is supposed to be preserved.
Obviously, volatile accesses must be detected during the compilation, when it is not always
possible to find the aliasing due to the pointer arithmetic. A typical example is an access to
a volatile variable by a pointer to a non-volatile type. The aliasing could be recognized when
an address assigned to the pointer is computed in the code directly from the volatile variable.
However, if such a computation is based on the address of another non-volatile variable, the
compiler would probably treat this access as non-volatile.
In fact, according to the C11 standard [ISO11], in case of an attempt to refer a volatile vari-
able/field by a pointer to a type not qualified as volatile, the behaviour is undefined. To avoid
such situations, many compilers introduce restrictions on how volatile variables/fields are al-
lowed to be accessed. In this work, we also assume that volatile objects are not accessed via
non-volatile pointers, what, in turn, makes it feasible to detect a volatile accesses directly from
the type of the pointer. If a non-volatile object is accessed via a pointer to a volatile type, we
will consider such an access also to be volatile, and will treat it here as an IO-operation.
Any access to a volatile object is considered to be an IO-operation. So, in order to define
IO-points for the C-IL machine, we first introduce a function computing a number of volatile
accesses in a given C-IL expression.
Definition 6.42 (Pointer to a Volatile Type Suitable for Memory Access). We say that a quali-
fied type tq ∈ TQ represented as tq = (q, t) can be used for a volatile access of the C-IL memory
158
6.1 Compiler Correctness for Sequential Mixed Machine
if the following predicate hold
volacc(tq)
def≡ t ∈ {ptr(q′, t′),array((q′, t′), n)} ∧ volatile ∈ q′ ∧
¬isarray(t′) ∧ (t′ 6= struct tC)
Note that we will use volacc(tq) only for machine states from which the computation does
not lead to the run-time error. Therefore, e.g., though the access to a whole volatile variable of
a composite type can be performed during the expression evaluation, the transition result for
such an access is the error-state and we do not count here the number of accesses for all fields
of the volatile-qualified composite type. Moreover, we can distinguish between a volatile array
and an array of volatiles. An access to a variable of an volatile array type just computes the
address using the environment parameter θ and, therefore, is not a volatile access.
Definition 6.43 (Number of Volatile Accesses in a C-IL Expression). The function Volpicil,θf :
E → N0 computes a number of volatile accesses that could appear during the evaluation of
a given expression in a body of a non-external C-IL function f ∈ Fname of a program picil ∈
ProgCIL wrt. the environment parameters θ.
• constant: x ∈ val
Volpicil,θf (x)
def≡ 0
• variable names: v ∈ V
Volpicil,θf (v)
def≡
{
1 : volacc
(
τpicil,θf (&v)
)
0 : otherwise
• function names: fn ∈ Fname
Volpicil,θf (fn)
def≡ 0
• unary operator: e ∈ E, ∈ O1
Volpicil,θf (e)
def≡ Volpicil,θf (e)
• binary operator: e1, e2 ∈ E,⊗ ∈ O2
Volpicil,θf (e1 ⊗ e2)
def≡ Volpicil,θf (e1) + Volpicil,θf (e2)
• ternary operator: e, e1, e2 ∈ E
Volpicil,θf (e ? e1 : e2)
def≡ Volpicil,θf (e)
Here it is important to mention that the correct number of volatile accesses in the ex-
pression with the ternary operator can only be computed in the presence of the expres-
sion evaluation of e. So, depending on its value the number of accesses could be either
Volpicil,θf (e)+Vol
picil,θ
f (e1) or Vol
picil,θ
f (e)+Vol
picil,θ
f (e2). Since we compute this number from
the expression without a given C-IL configuration, we have to make a restriction on the
volatile accesses in the ternary operator. We chose the simplest solution such that volatile
accesses are allowed to appear only in e, and not in e1 or e2, i.e., we require
Volpicil,θf (e1) = Vol
picil,θ
f (e2) = 0
159
6 Compiler Correctness and Justification of Concurrent Mixed Model
This definition differs from the version presented in [Bau14b] where the author tested all
the expressions e, e1, and e2 together on the presence of the volatile accesses. However, for
instance, if e and e1 do not contain volatile accesses, while e2 does, and during a step the
condition e is evaluated to one, the volatile read in e2 would be performed neither in the C-
IL nor in the compiled code, though the whole expression with the ternary operator would
contain a volatile access according to [Bau14b]. Therefore, matching IO-points between
the abstract and concrete machines wrt. Definition 2.32 would become a problem.
• type cast: t ∈ TQ, e ∈ E
Volpicil,θf ((t)e)
def≡ Volpicil,θf (e)
• pointer dereferencing: e ∈ E
Volpicil,θf (∗(e))
def≡
{
Volpicil,θf (e) + 1 : volacc
(
τpicil,θf (e)
)
Volpicil,θf (e) : otherwise
• address of: e ∈ E
Volpicil,θf (&(e))
def≡

Volpicil,θf (e
′) : e = ∗(e′)
Volpicil,θf (&(e
′)) : e = (e′).z
0 : otherwise
• field access: e ∈ E, z ∈ F
Volpicil,θf ((e).z)
def≡ Volpicil,θf (∗(&((e).f)))
Note that if a composite type is volatile-qualified, then all its fields are treated as volatile
independently from their explicitly given types.
• size of a type or an expression: x ∈ TQ ∪ E
Volpicil,θf (sizeof(x))
def≡ 0
If we have to compute the number of volatile accesses during the evaluation of the expression
e in a C-IL configuration ccil ∈ CCIL we will use
Volpicil,θccil (e)
def≡ Volpicil,θftop(ccil)(e)
Definition 6.44 (Number of Volatile Reads and Writes in a C-IL Step). Given ccil ∈ CCIL, θ,
and picil we define the number of volatile reads and writes performed during a C-IL step from
the given configuration and without the run-time error. For s ≡ stmtnext(ccil, picil) we have the
number of volatile reads during a C-IL step as
Volpicil,θread (ccil)
def≡

Volpicil,θccil (&(e0)) + Vol
picil,θ
ccil
(e1) : s = (e0 = e1)
Volpicil,θccil (e) : s = ifnot e goto l
cvol : s = call e(E)
cvol + Volpicil,θccil (&(e0)) : s = (e0 = call e(E))
Volpicil,θccil (e) : s = return e
160
6.1 Compiler Correctness for Sequential Mixed Machine
where the number of volatile accesses for a function call is
cvol ≡ Volpicil,θf (e) +
∑
e′∈E
Volpicil,θf (e
′)
The number of volatile writes for a C-IL step is computed as
Volpicil,θwrite(ccil)
def≡

〈vw〉 : s = (e0 = e1)
〈vw′〉 : s = return e ∧ top(ccil) > 1∧
s′ = (e0 = call e(E))
0 : otherwise
where s′ is the callee’s C-IL statement that is the call of the function from which the return with
a value is made
s′ ≡ Ptop(c′cil, picil)[loctop(c′cil)− 1] with c′cil ≡ dropframe(ccil)
and vw, vw′ ∈ B are flags indicating that a write to a volatile type is performed in a correspond-
ing case
vw ≡ volacc (τpicil,θccil (&(e0))) vw′ ≡ volacc (τpicil,θc′cil (&(e0)))
Note that according to the C-IL semantics, a C-IL step leading to the non-error state can make
at most one write because the C-IL transition function cannot result in a simultaneous write to a
few fields of any composite type or elements of an array. We consider the proof of this property
to be just a bookkeeping exercise and to lay outside of the scope of the thesis.
Definition 6.45 (Number of Volatile Accesses during a C-IL Step). Then we define the total
number of volatile accesses during a C-IL step not leading to the error-state as
Volpicil,θacc (ccil)
def≡ Volpicil,θread (ccil) + Volpicil,θwrite(ccil)
Any volatile access is considered to be suitable for an IO-operation, that’s why a C-IL step
containing an volatile access is at the IO-point. However, we will also consider a step executing
the function cas to be at IO-point.
Definition 6.46 (Number of C-IL Global Memory Accesses Suitable for IO-operations). For
a C-IL step we compute the number of global memory accesses suitable for IO-operations as a
number of volatile accesses during a step plus an atomic write performed by the cas function:
nIOpicil,θCIL (ccil)
def≡
{
Volpicil,θacc (ccil) + 1 : cascall
picil,θ(ccil)
Volpicil,θacc (ccil) : otherwise
Finally, we can introduce the software condition on the number of global memory accesses
suitable for IO-operations during a C-IL step
Definition 6.47 (Software Condition on Number of IO-operations per C-IL Step). For a given
C-IL configuration such that δpicil,θCIL (ccil) 6= ⊥we require
scnIOCIL (ccil, picil, θ)
def≡ nIOpicil,θCIL (ccil) ≤ 1
161
6 Compiler Correctness and Justification of Concurrent Mixed Model
Definition 6.48 (IO- andOT -Points for C-IL Machine). Given ccil ∈ CCIL, θ, and picil we define
whether the C-IL machine with the given configuration ccil is at an IO- orOT -point respectively
as follows
IOpicil,θCIL (ccil)
def≡ nIOpicil,θCIL (ccil) = 1
OT picil,θCIL (ccil)
def≡ cascallpicil,θ(ccil) ∨
(
Volpicil,θwrite(ccil) = 1
)
Definition 6.49 (IO- and OT -Points for MX Machine). Let the shorthands be
cµ ≡ confMASM(c) st′ ≡ (stic(c.ic))
ccil ≡ confCIL(c) s′ ≡ Ptop(st′, pi)[loctop(st′)− 1]
Iµ ≡ I(cµ, piµ) f ′ ≡ ftop(st′)
Then, the predicate IOpi,θMX(c) indicates that the MX machine in a configuration c ∈ CMX is in
an IO-point if (i) it is in an IO-point during any C-IL step except the return to MASM, (ii) the
return from C-IL to MASM performs a volatile read, (iii) a MASM step executes an assembly
instruction cas or locksw5, or (iv) the return from MASM to C-IL writes at a C-IL return value
destination pointing to a volatile type:
IOpi,θMX(c)
def≡

IOpicil,θCIL (ccil) : cil(c.ac) ∧ ¬retMASMCIL (c, pi)
Volpicil,θread (ccil) = 1 : cil(c.ac) ∧ retMASMCIL (c, pi)
cas(Iµ) ∨ locksw(Iµ) : masm(c.ac) ∧ instrnext(cµ, piµ) ∈ IMASMASM
volacc
(
τpicil,θf ′ (&(e0))
)
: masm(c.ac) ∧ retCILMASM(c, pi) ∧ s′ = (e0 = call e(E))
0 : otherwise
In turn, we define OT -points of the MX machine as
OT pi,θMX(c)
def≡

OT picil,θCIL (ccil) : cil(c.ac) ∧ ¬retMASMCIL (c, pi)
cas(Iµ) ∨ locksw(Iµ) : masm(c.ac) ∧ instrnext(cµ, piµ) ∈ IMASMASM
volacc
(
τpicil,θf ′ (&(e0))
)
: masm(c.ac) ∧ retCILMASM(c, pi) ∧ s′ = (e0 = call e(E))
0 : otherwise
Obviously, in analogy to the C-IL case we require a software condition on the number of
IO-accesses per a step of the mixed machine to hold.
Definition 6.50 (Software Condition on Number of IO-operations per a C-IL Step in the
MX Machine). For a given MX configuration c ∈ CMX, a program pi ∈ ProgMX such that
δpi,θMX(c,in) 6= ⊥with in ∈ ΣMX we require
scnIOMX (c, pi, θ)
def≡ (i) cil(c.ac) ∧ ¬retMASMCIL (c, pi) =⇒ scnIOCIL (ccil, picil, θ)
(ii) cil(c.ac) ∧ retMASMCIL (c, pi) =⇒ Volpicil,θread (ccil) ≤ 1
Recall, that in Definition 6.34 we have already introduced the conditions on the consistency
points in the C-IL code. Now, additionally to them, we require that any IO-point of the MX
machine is a consistency point.
5Here, for simplicity, we do take into account accesses from the MASM program to volatile variables declared in the
C-IL program.
162
6.1 Compiler Correctness for Sequential Mixed Machine
Definition 6.51 (MX IO-Points are Consistency Points).
IOcppi,θMX(c, info)
def≡ IOpi,θMX(c) =⇒ cpMX(c, info)
This property is not needed for the justification of the MX semantics, but required for the
correctness proof of the kernel threads. In fact, it demands that the C-IL compiler inserts consis-
tency points before any volatile access. In case of the MASM steps, this condition trivially holds
because we have consistency points at every MASM instruction.
In order to define IO-points for the SB reduced ISA executing the compiled code of a mixed
program, we first define a set of addresses of instructions implemented volatile accesses. Though
we have already stated the software condition, we provide a definition in a general form, i.e.,
for all volatile accesses in the compiled code of any C-IL statement.
AvolCIL(picil, infocil, θ, cbacil)
def≡
cbacil +32 (4 · o)32
∣∣∣∣∣∣∣∣∣
∃f ∈ Fname, loc ∈ N.
f ∈ dom (Fθpicil) ∧ ¬ext(f, picil, θ)∧
loc ∈ [1 : |Fθpicil(f).P |]∧
o ∈ infocil.offvol(f, loc)

Definition 6.52 (IO- and OT -Points for MIPS-86 ISA Executing MX). Let a MIPS-86 instruc-
tion being executed in a configuration d ∈ CMIPS be I ≡ d.m4(d.cpu.core.pc), then we define
IOMXrMIPS(d, pi, info, θ, cba)
def≡ lw(I) ∧ (d.cpu.core.pc ∈ AvolCIL(picil, infocil, θ, cbacil)) ∨
cas(I) ∨ locksw(I)
OT MXrMIPS(d)
def≡ cas(I) ∨ locksw(I)
Note that the definition is basically rely on the core configuration and the code region, and
no other memory. Moreover, lw(I) is present in the formula in order to show that due to the
simple store buffer reduction policy we are interested only in load operations used for volatile
accesses. The predicate could also be stated without this explicit condition, though AvolCIL may
include addresses of other memory instructions implementing volatile accesses in the code.
6.1.2.5 Addresses of Global Variables and Constants
We compute the memory byte addresses occupied by the C-IL global variables (including those
which type is qualified as constant) declared in the program picil:
AgvarCIL (picil, θ)
def≡
⋃
(v,tq)∈picil.VG
{θ.allocgvar (v)}sizeθ(qt2t(tq))
Among these memory bytes we distinguish the memory addresses occupied by elements of
arrays, variables, and fields with types qualified as constants. We collect such byte addresses in
the set AconstCIL (picil, θ) ∈ 2B
32
. In order to compute this set we introduce auxiliary computation.
Let a ∈ B32 be a byte address at which a global variable, element of array, or a struct field of a
qualified type tq ≡ (q, t), tq ∈ TQ resides in the memory. Then we recursively define
AconstCIL (a, tq, picil, θ)
def≡

{a}sizeθ(qt2t(tq)) : const ∈ q ∧
¬isarray(t) ∧ t 6= void⋃
i∈[0:n−1]A
const
CIL
(
ai, t
′
q, picil, θ
)
: t = array(t′q, n)⋃
(f,t′q)∈picil.TF (tC)A
const
CIL
(
af , t
′
q, picil, θ
)
: const /∈ q ∧ t = struct tC
∅ : otherwise
163
6 Compiler Correctness and Justification of Concurrent Mixed Model
where the addresses of fields f and elements i of arrays are computes as
ai ≡ a+32
(
i · sizeθ(qt2t(t′q))
)
32
af ≡ a+32 (θ.offsetstruc(tC , f))32
Hence for all global variables we compute
AconstCIL (picil, θ)
def≡
⋃
(v,tq)∈picil.VG
AconstCIL (θ.allocgvar (v), tq, picil, θ)
6.1.2.6 Requirements and Conditions for MIPS-86 Machine
Now, we define requirements on steps of the SB reduced MIPS-86 machines which will enable
the simulation for the MX model.
Definition 6.53 (Suitability of Inputs for Reduced MIPS-86 Executing MX). We consider an
input in ∈ ΣMIPS to be suitable for the MX simulation when the reset signal is low:
suitMXrMIPS(in)
def≡ in = (core, wI , wD, eev) =⇒ ¬eev[0]
Definition 6.54 (Configuration Well-Formedness for Reduced MIPS-86 Executing MX). The
well-formedness of the MIPS-86 configuration for the MX simulation requires that the processor
core must be in system mode, the store buffer is empty, and the maskable interrupts are masked:
wfconf MXrMIPS(d)
def≡ (i) ¬mode(d.cpu.core)
(ii) d.cpu.sb = ε
(iii) ∀i ∈ {1, 7}. d.cpu.core.spr(sr)[i] = 0
Definition 6.55 (Well-Behaviour of Reduced MIPS-86 Executing MX). The well-behaviour of
a step of the reduced MIPS-86 machine encoding the MX semantics means that (i) the system
mode is preserved (under software conditions considered later), (ii) the compiler guarantees
that no illegal instructions appear in the code and memory accesses are properly aligned, and
(iii) the software conditions needed for the store buffer reduction are transferred. Using the
shorthands
core ≡ d.cpu.core I ≡ d.m4(core.pc)
d′ ≡ δrMIPS(d, in) core′ ≡ d′.cpu.core
we formally define
wbMXrMIPS(d, in)
def≡ (i) ¬mode(d′.cpu.core)
(ii) ¬jisr(core, I, eev, 0, 0)
(iii) scrMIPS(d, in)
For brevity we provide here zeros instead of computation of page fault signals because we know
that the computation is made in system mode.
6.1.2.7 MX Software Conditions
We distinguish between static and dynamic software conditions. The static software conditions
make restrictions on the code of a given program and how the linker palaces the compiled
164
6.1 Compiler Correctness for Sequential Mixed Machine
program into the memory depending on the system. The dynamic software conditions are
properties of steps and mostly cannot be detected during the compilation and placement of the
compiled code into the memory.
Recall from Section 4.2, we split the memory into disjoint sets:
Ahyp ∪Aguest = B32, Ahyp ∩Aguest = ∅, Ahyp = Acode ∪Aconst ∪Adata
∀X,Y ∈ {Acode, Aconst, Adata} . X 6= Y =⇒ X ∩ Y = ∅
Definition 6.56 (Static Software Conditions for MX). Obviously, one has to require that the
compiled code, data, and stack of the hypervisor/OS kernel resides in Ahyp. Taking into ac-
count that the aforementioned sets are disjoint we require the following static conditions: (i) the
compiled code of the mixed program of the hypervisor/OS kernel is placed into the code region
such that (ii) the binaries of C-IL and MASM programs do not overlap, (iii) – (iv) the stack and
global variables not of the constant-qualified type belong to Adata, (v) the data C-IL constants
reside in Aconst6. Moreover, (vi) if a MASM procedure is supposed to be called from C-IL, it
must save all callee-save registers.
scstatMX (pi, info, θ, cba, sba,mss)
def≡
(i) AcodeMX (info, cba) = Acode
(ii) validcodeMX (info, cba)
(iii) AstackMX (sba,mss) ⊂ Adata
(iv) AgvarCIL (picil, θ) \AconstCIL (picil, θ) ⊂ Adata
(v) AconstCIL (picil, θ) = Aconst
(vi) scprogMX (pi, θ)
Definition 6.57 (Dynamic Software Conditions for MX). In order to define the dynamic soft-
ware conditions, we consider c′ ∈ CMX⊥ such that c′ ≡ δpi,θMX(c, in) and cµ ≡ confMASM(c). Then
for a step of the mixed machine we require: (i) no run-time error, (ii) – (iii) no access to stack
and and code regions, (iv) no stack overflow, (v) at most one volatile access per a step, (vi) the
mode and interrupt masking is not changed by the programmer, (vii) the access to the guest
memory region is only performed by operations suitable for IO-access, (viii) no misaligned
data accesses from MASM are made.
scdynMX(c, in, pi, info, θ, cba, sba,mss)
def≡
(i) c′ 6= ⊥
(ii) accadMX(c, pi, θ) ∩AstackMX (sba,mss) = ∅
(iii) accadMX(c, pi, θ) ∩AcodeMX (info, cba) = ∅
(iv) ¬stackovf pi,θMX (stMX(c′), info, sba,mss)
(v) scnIOMX (c, pi, θ)
(vi) ∀r ∈ {mode, sr}. c′.spr(r) = c.spr(r)
(vii) accadMX(c, pi, θ) ∩Aguest 6= ∅ =⇒ IOpi,θMX(c)
(viii) masm(c.ac) ∧ instrnext(cµ, piµ) ∈ IMASMASM =⇒
¬dmal (coreMIPS(cµ), I(cµ, piµ))
6Local variables are not forbidden from being constant-qualified. However, since they are allocated only on the stack
and cannot not be used for shared accesses, we include them into Adata.
165
6 Compiler Correctness and Justification of Concurrent Mixed Model
The condition (vii) will be guaranteed by the safety policy when we consider the concurrent
model with the ownership. However, as we have seen from the software conditions in the
store buffer reduction for a step of the single core MIPS-86, we need to guarantee that the guest
memory is not written by the instruction sw in the compiled code. Since we can rely on the fact
that volatile accesses are not translated to sw, we make this more general requirement
Moreover, we need to require the absence of date misalignment during MASM steps explic-
itly. As for instruction misalignment, it must be treated by the MASM compiler. Analogously,
the C-IL compiler is responsible for catching both kinds of misalignment during C-IL step.
Note again that for brevity here we do not state formally the software condition requiring that
volatile variables/fields are not accessed by pointers to non-volatile types (see Section 6.1.2.4).
Definition 6.58 (Software Conditions for MX Machine). Finally we combine the static and
dynamic software conditions:
scMX(c, in, pi, info, θ, cba, sba,mss)
def≡
(i) scstatMX (pi, info, θ, cba, sba,mss)
(ii) scdynMX(c, in, pi, info, θ, cba, sba,mss)
6.1.2.8 Sequential MX Compiler Correctness in Concurrent Context
Now, we define that the software conditions hold for all defined computations from a given
starting MX configuration c0 ∈ CMX till the next consistency point in the following way:
SCseqMX(c0, pi, info, θ, cba, sba,mss, icm)
def≡
∀n ∈ N, c ∈ (CMX⊥)n+1 , λ ∈ (ΣMX)n .
(i) c1 = c0 ∧
(
c1 −→nδpi,θMX,λ cn+1
)
(ii) ∀i ∈ [2 : n]. ci 6= ⊥ =⇒ ¬cpMX(ci, info)
(iii) cn+1 6= ⊥ =⇒ cpMX(cn+1, info)
=⇒
(i) ∀i ∈ Nn. scMX(ci, λi, pi, info, θ, cba, sba,mss)∧
noaccpi,θMX(ci, icm)∧
(ii) cn+1 6= ⊥ =⇒ wfconf pi,θMX(cn+1)
Note that the definition corresponds to one of the premises in the generalized sequential simu-
lation theorem.
For non-empty sequences d ∈ (CMIPS)∗, σ ∈ (ΣMIPS)∗, such that |d| = |σ| + 1, and c ∈
(CMX)∗, τ ∈ (ΣMX)∗ with |c| = |τ | + 1 we define how IO- and OT -points are matched (wrt.
Definition 2.32) between the MX machine and the MIPS-86 ISA implementing it:
oneIOMXrMIPS(d, σ, c, τ, pi, info, θ, cba)
def≡
(i) ∀i, j ∈ N|τ |. IOpi,θMX(ci) ∧ IOpi,θMX(cj) =⇒ i = j
(ii) ∀i, j ∈ N|σ|. IOMXrMIPS(di, pi, info, θ, cba) ∧ IOMXrMIPS(dj , pi, info, θ, cba) =⇒ i = j
(iii)
(
∃i ∈ N|τ |. IOpi,θMX(ci)
)
=⇒ (∃i ∈ N|σ|. IOMXrMIPS(di, pi, info, θ, cba))
(iv)
(
∃i ∈ N|τ |. OT pi,θMX(ci)
)
⇐⇒ (∃i ∈ N|σ|. OT MXrMIPS(di))
166
6.1 Compiler Correctness for Sequential Mixed Machine
Additionally we define the shorthands
nocpMXrMIPS(d, pi, info, θ, cba)
def≡ ∀i ∈ N|d|. ¬cpMXrMIPS(di, pi, info, θ, cba)
nocpMX(c, info)
def≡ ∀i ∈ N|c|. ¬cpMX(ci, info)
Finally, we can state the sequential mixed machine compiler correctness in the concurrent
setting. Later, when we consider the justification of the concurrent MX model, we will use this
compiler correctness in order to instantiate the sequential simulation framework and obtain
Theorem 2.3 for the given models.
Theorem 6.1 (Sequential MX Compiler Correctness in the Concurrent Context). Given any
mixed program pi, code base addresses cba, base address sba of the stack and its maximal size mss, C-IL
environment parameters θ, compiler information info, MX machine configuration c0, possible inconsis-
tent portion of memory icm, and non-empty sequences d and ω of configurations and inputs respectively
for the store buffer reduced MIPS-86 machine such that
(i) the MX configuration c0 and the starting MIPS-86 configuration d1 of the sequence d are well-
formed,
(ii) both machines are in consistency points in c0 and d1,
(iii) coupled via the MX compiler consistency relation wrt. the compiler information info and possible
inconsistent portion of memory icm belonging to the hypervisor or OS kernel addresses,
(iv) the sequence ω of MIPS-86 inputs is suitable for the simulation (has not active reset signal) and d
is computed by steps of the reduced MIPS-86 machine with the inputs from ω.
(v) There are no consistency points between the first and the last configurations of d.
(vi) Moreover, the MX software conditions hold and addresses icm are not accessed during all defined
MX steps from the given configuration c0 till the next consistency point. Additionally, the MX
machine has a well-formed configuration in this consistency point.
Then, there exist pairs of sequences (d′, σ) and (c, τ) of configurations and inputs for the reduced MIPS-
86 and MX machines respectively such that
(i) the sequences d and ω represent prefixes of d′ and σ,
(ii) σ contains only suitable inputs and configurations d′ are obtained by SB reduced MIPS-86 ISA
steps with the inputs from σ,
(iii) these steps reach a consistency point at the end of d′ not containing other consistency points in
between,
(iv) are well-behaved and lead to the well-formed configuration at the end of d′.
(v) In turn, the computation of configurations in c with c1 = c0 is defined for the inputs τ by the
semantics of the MX machine having at the end of the computation a well-formed state
(vi) with a consistency point next after the starting configuration c0.
(vii) Moreover, the compiler guarantees that consistency points for the MX machine are inserted at least
before and after function/procedure calls, at first statements of function bodies, at IO-points, and
167
6 Compiler Correctness and Justification of Concurrent Mixed Model
(viii) IO- and OT -points are properly matched between the abstract MX machine and its implementa-
tion by the reduced MIPS-86 model.
(ix) At the end of the computation both machines are also coupled via the MX compiler consistency
relation wrt. info and the same icm.
∀pi ∈ ProgMX, cba ∈ B32 × B32, sba ∈ B32,mss ∈ N, θ ∈ ParamsCIL,
c0 ∈ CMX, info ∈ infoT MX, icm ∈ 2B
32
, k ∈ N, d ∈ (CMIPS)k+1 , ω ∈ (ΣMIPS)k .
(i) wfconf pi,θMX(c0) ∧ wfconf MXrMIPS(d1)
(ii) cpMX(c0, info) ∧ cpMXrMIPS(d1, pi, info, θ, cba)
(iii) icm ⊂ Ahyp ∧ consisMX(c0, d1, pi, θ, info, cba, sba,mss, icm)
(iv)
(
d1 −→kδrMIPS,ω dk+1
)
∧ ∀i ∈ Nk. suitMXrMIPS (ωi)
(v) nocpMXrMIPS(d[2 : k], pi, info, θ, cba)
(vi) SCseqMX(c0, pi, info, θ, cba, sba,mss, icm)
=⇒
∃n ∈ N, d′ ∈ (CMIPS)n+1 , σ ∈ (ΣMIPS)n ,m ∈ N, c ∈ (CMX)m+1 , τ ∈ (ΣMX)m .
(i) n ≥ k ∧ d′[1 : k + 1] = d ∧ σ[1 : k] = ω
(ii)
(
d′1 −→nδrMIPS,σ d′n+1
)
∧ ∀i ∈ Nn. suitMXrMIPS (σi)
(iii) cpMXrMIPS(d
′
n+1, pi, info, θ, cba) ∧ nocpMXrMIPS(d′[2 : n], pi, info, θ, cba)
(iv) wfconf MXrMIPS(d′n+1) ∧ ∀i ∈ Nn.wbMXrMIPS(d′i, σi)
(v) c1 = c0 ∧
(
c1 −→mδpi,θMX,τ cm+1
)
∧ wfconf pi,θMX(cm+1)
(vi) cpMX(cm+1, info) ∧ nocpMX(c[2 : m], info)
(vii) validcpMX(pi, info, θ) ∧ IOcppi,θMX(cm+1, info)
(viii) oneIOMXrMIPS(d′, σ, c, τ, pi, info, θ, cba)
(ix) consisMX(cm+1, d′n+1, pi, θ, info, cba, sba,mss, icm)
In comparison to the usual sequential compiler correctness proven for icm = ∅, the sequential
compiler correctness needed for the concurrent systems require additional guarantees such as
partial consistency in case of environment steps, the correct implementation of steps suitable
for IO-operations and ownership transfer, etc.
6.2 Concurrent MX Model Justification
6.2.1 Cosmos Model Instantiations
Concurrent Mixed Machine
Given a program pi = (piµ, picil) ∈ ProgMX, the environment parameters θ ∈ ParamsCIL, and
the system information ξ ≡ (cba, sba,mss) with cba ∈ B32 × B32, sba ∈ (B32)np, mss ∈ (N)np
168
6.2 Concurrent MX Model Justification
wrt. the number np ∈ N of processors in the multi-core MIPS-86 machine such that
validstacksMX (np, sba,mss)
def≡ ∀p, q ∈ Nnp. p 6= q =⇒ AstackMX (sbap,mssp)∩AstackMX (sbaq,mssq) = ∅
we can now define the instantiation Spi,θ,ξMX ∈ S of the Cosmos model for the mixed machine.
Let cmx(u,m) ≡ confMX (u, dmeB32) be a shorthand denoting a sequential MX machine con-
figuration obtained on the base of the unit’s configuration u and a memory m restricted by the
set of addresses considered in the component of the Cosmos model signature. Moreover, the
result of the compilation is info ≡ (cmplMASM(piµ), cmplCIL(picil)). Then the components are
instantiated as follows:
• Spi,θ,ξMX .A = B32 \
(
AcodeMX (info, cba) ∪
⋃
p∈Nnp A
stack
MX (sbap,mssp)
)
• Spi,θ,ξMX .V = B8
• Spi,θ,ξMX .R = AconstCIL (picil, θ)
• Spi,θ,ξMX .nu = np
• Spi,θ,ξMX .U = KMX⊥
• Spi,θ,ξMX .E = ΣMX
• Spi,θ,ξMX .reads(u,m, in) =
{
readsMX
(
cmx(u,m), pi, θ
)
: u 6= ⊥
∅ : otherwise
• Spi,θ,ξMX .δ(u,m, in) =

(
u′, c′mx.M|writesMX(cmx(u,m),pi,θ)
)
: u 6= ⊥ ∧ c′mx 6= ⊥
(⊥,m∅) : u 6= ⊥ ∧ c′mx = ⊥
undefined : otherwise
where the next sequential MX machine configuration c′mx ≡ δpi,θMX (cmx(u,m), in) is used to
compute the unit’s next configuration u′ ≡ (c′mx.ac, c′mx.ic, c′mx.spr) and m∅ in the empty
function satisfying dom (m∅) = ∅ 7.
• Spi,θ,ξMX .IP(u,m, in) = (u 6= ⊥ =⇒ cpMX(u.ac, info))
• Spi,θ,ξMX .IO(u,m, in) =
(
u 6= ⊥ =⇒ IOpi,θMX (cmx(u,m))
)
• Spi,θ,ξMX .OT (u,m, in) =
(
u 6= ⊥ =⇒ OT pi,θMX (cmx(u,m))
)
Note that the memory m in the last three components represents the read-only memory, there-
fore we are not allowed to consider the whole memory for identification of consistency points
and IO-/OT -points. In fact, we took this into account already in the definitions for the mixed
machine, where we do not rely on the memory configuration at all. The whole MX configuration
cmx(u,m) is used here only for brevity.
7Recall that according to Cosmos model signature from Definition 2.1 the result of the transition is a unit configuration
(instantiated as KMX⊥ for our machine here) and a part of the memory modified during the step. Therefore, in
order to match this formalism, we get Spi,θ,ξMX .δ(u,m, in) = (⊥,m∅) in case of an MX run-time error.
169
6 Compiler Correctness and Justification of Concurrent Mixed Model
SB Reduced Multi-Core MIPS-86 Implementing Concurrent MX Machine
In Section 4.3.2 for the proof of the store buffer reduction we have already considered the
instantiation of the Cosmos model with the reduced MIPS-86 machine. Since we allowed the
interleaving in every step and detected IO-/OT -points independently from the compiled code,
we adapt the instantiation SrMIPS ∈ S to Spi,θ,ξrMIPS ∈ S taking into account the compilation of
the MX program and the choice of the consistency points as well as the steps suitable for IO-
operations and ownership transfer.
• For components X ∈ {A,V,R,nu,U , E , reads, δ} the instantiation is equal to the one for
the reduced machine
Spi,θ,ξrMIPS.X = SrMIPS.X
with Acode = AcodeMX (info, cba) and Aconst = A
const
CIL (picil, θ).
• Spi,θ,ξrMIPS.IP(u,m, in) = cpMXrMIPS(u.core, pi, info, θ, cba)
• Spi,θ,ξrMIPS.IO(u,m, in) = IOMXrMIPS ((u, dmeB32), pi, info, θ, cba)
• Spi,θ,ξrMIPS.OT (u,m, in) = OT MXrMIPS ((u, dmeB32))
Note again that the definitions of IOMXrMIPS and OT MXrMIPS rely on the memory region containing
the compiled code which is, in turn, is treated as read-only memory in our instantiation.
6.2.2 Sequential Simulation Theorem
Now, taking into account the formulation of Theorem 6.1, we instantiate the sequential simula-
tion framework RSMXSrMIPS(pi, θ, ξ) ∈ R for the two Cosmos models S
pi,θ,ξ
MX , S
pi,θ,ξ
rMIPS ∈ S defined wrt.
the given pi, θ, and ξ.
For d ∈ Cproc × Cm, d = (cpu,m), and c ∈ KMX⊥ ×
(
Spi,θ,ξMX .A → Spi,θ,ξMX .V
)
, c = (k,M), and
info ∈ infoT MX, icm ∈ 2B
32
we define
RSMXSrMIPS(pi, θ, ξ).

P = infoT MX
sim(d, c, info, icm) = icm ⊂ Ahyp ∧ validcpMX(pi, info, θ) ∧ IOcppi,θMX(c, info)∧(
k 6= ⊥ =⇒
consisMX (cmx(k,M), d, pi, θ, info, cba, sba,mss, icm′)
)
CPa(k, info) = (k 6= ⊥ =⇒ cpMX(k.ac, info))
CPc(cpu, info) = cpMXrMIPS(cpu.core, pi, info, θ, cba)
wfa(c) = k 6= ⊥ ∧ wfconf pi,θMX (cmx(k,M))
wfc(d) = wfconf MXrMIPS(d)
suit(ind) = suit
MX
rMIPS(ind)
sc(c, inc, info) = valid
stacks
MX (np, sba,mss) ∧ (k 6= ⊥ =⇒
scMX (cmx(k,M), inc, pi, info, θ, cba, sba,mss))
wb(d, ind, info) = wb
MX
rMIPS(d, ind)
where the icm′ ≡ icm ∪⋃p∈Nnp AstackMX (sbap,mssp).
Note that since in Theorem 6.1 we use more predicates than are provided by the sequential
simulation framework in Definition 2.30, we have to combine them properly in the framework
170
6.2 Concurrent MX Model Justification
instantiation. Moreover, the unit configuration k of the MX Cosmos machine can be the run-
time error state ⊥ and must be excluded in the application of MX predicates undefined for it.
Additionally to the MX software conditions, we also require that the stacks of the concurrent
MX machine do not overlap.
Having Theorem 6.1 proven by a compiler developer, it is easy to show (by unfolding the
definitions) that the generalized sequential simulation Theorem 2.3 also holds for our Cosmos
models Spi,θ,ξMX , S
pi,θ,ξ
rMIPS ∈ S and the simulation framework RSMXSrMIPS(pi, θ, ξ) ∈ R instantiated wrt.
the given program pi and the parameters θ, ξ. Therefore, the sequential MX compiler correctness
in the concurrent setting in terms of Cosmos machines can be stated in the following way:
Theorem 6.2 (Sequential MX Compiler Correctness for Cosmos Model Simulation). The gen-
eralized sequential simulation Theorem 2.3 holds for any Cosmos models Spi,θ,ξMX , S
pi,θ,ξ
rMIPS ∈ S and the
simulation framework RSMXSrMIPS(pi, θ, ξ) ∈ R instantiated wrt. the any given mixed machine program
pi = (piµ, picil) ∈ ProgMX, the environment parameters θ ∈ ParamsCIL, and the system information
ξ ≡ (cba, sba,mss) with cba ∈ B32 × B32, sba ∈ (B32)np, mss ∈ (N)np and the number np ∈ N of
processors in the multi-core MIPS-86 machine such that the condition validstackcMX (np, sba,mss) holds.
6.2.3 Concurrent Model Simulation
Finally, in order to show that the execution of the compiled code of any mixed program on
the multi-core MIPS-86 machine corresponds to the steps of the concurrent mixed machine, we
instantiate other predicates needed for the Cosmos model simulation theorem and the proof of
the required assumptions.
Again, we first consider the simulation for machines with the given program pi and the pa-
rameters θ, ξ.
Definition 6.59 (Shared Invariant for Concurrent Mixed Machine Simulation).
sinvSMXSrMIPS(pi, θ, ξ)
(
(m,S,R,O), (mmx,Smx,Rmx,Omx), info
) def≡
(i) S = Smx
(ii) R = Rmx ∪AcodeMX (info, cba)
(iii) ∀p ∈ Nnu . O(p) = Omx(p) ∪AstackMX (sbap,mssp)
(iv) ∀p ∈ Nnu . AstackMX (sbap,mssp) ∩ S = ∅
(v) m |Smx∪Rmx = mmx
The part (iv) was not explicitly required in [Bau14b]. However, it is needed in our work
because of the further property transfer. Obviously, the stack currently used by a processor
should not be shared among others.
Analogously to the store buffer reduction we restrict the ownership safety policy for the con-
current mixed machine.
Definition 6.60 (Restriction on the Ownership Transfer in Concurrent MX Machine). For any
configuration E ∈ CSpi,θ,ξMX of the MX Cosmos machine we require that Aguest is always treated as
shared and no addresses from Aguest can be owned by any execution unit. Formally, we define
this by the property instantiating PSa in Theorem 2.4:
PSpi,θ,ξMX
(E)
def≡ Aguest ⊂ E.S ∧ ∀p ∈ Nnu. E.Op ∩Aguest = ∅
171
6 Compiler Correctness and Justification of Concurrent Mixed Model
Since there is no need to make further restrictions on the unit’s configuration of the SB re-
duced machine, we set
uinvSMXSrMIPS(pi, θ, ξ)(cpu,O,S)
def≡ 1
Now, in order to guarantee that the Cosmos model simulation Theorem 2.4 holds for the given
models, sequential simulation framework, and additional predicates instantiated for the given
pi, θ, ξ, one has to discharge Assumptions 2.1–2.4. Since the implementation of both C-IL and
MASM compilers is out of the scope of this thesis, we do not consider the proof of the assump-
tions here. The main ideas of such proofs for MASM and C-IL separately, however, can be found
in [Bau14b].
Obviously, we also have to guarantee that Theorem 2.4 holds for all our instantiations.
Theorem 6.3 (Cosmos Model Simulation Theorem for all Mixed Machine Programs and Sys-
tem/Environment Parameters). Theorem 2.4 holds for any mixed machine programs pi = (piµ, picil) ∈
ProgMX, the environment parameters θ ∈ ParamsCIL, and the system information ξ ≡ (cba, sba,mss)
used for instantiation of the models Spi,θ,ξMX , S
pi,θ,ξ
rMIPS ∈ S.
6.2.4 Application of the Concurrent MX Machine Simulation
The overall model stack for the justification of the concurrent MX machine implemented on the
multi-core MIPS-86 is depicted on Figure 6.9 and considered in this section. We discuss the
application of the theorems introduced by now and sketch the safety and properties transfer
required for the overall simulation.
6.2.4.1 Properties Transfer from MX Machine to Reduced MIPS-86 ISA
Above we have stated the simulation theorem between the store buffer reduced multi-core
MIPS-86 model Spi,θ,ξrMIPS and the concurrent mixed machine S
pi,θ,ξ
MX . This theorem can only be
applied to the reordered steps of MIPS-86 composing incomplete consistency blocks.
The justification of this reordering preserving verified properties and safety were considered
in detail in Section 2.4 where we transformed the concrete machine with the well-behaviour
proven as a property of steps to an extended machine with the same property based only on
a Cosmos machine configuration. In the context of the concurrent MX justification we denote
this machine by S′pi,θ,ξrMIPS depicted on Figure 6.9. However, since we have made Assumption 2.5
on the well-behaviour and the transfer of any abstract property must satisfy the requirements
introduced in Definitions 2.48 – 2.49, we have to discharge them here.
Recall that according to Assumption 2.5 the well-behaviour of the MIPS-86 machine should
not rely on the memory outside of the reads-set. For the proof of this assumption we consider a
similar lemma.
Lemma 6.1 (MIPS-86 Well-Behaviour Restriction for Concurrent MX Simulation).
For all configurations (cpu,m) ∈ CMIPS and inputs in ∈ ΣMIPS, we have
wbMXrMIPS((cpu,m), in)⇐⇒
(∀m′. m′ |R = m |R =⇒ wbMXrMIPS((cpu,m′), in))
where the reads-set R is computed as R ≡ readsMIPS((cpu,m), in).
Proof: The proof is a trivial bookkeeping and based on the fact that the definition of wbMXrMIPS
relies only on the processor core configuration and the instruction fetched in system mode. The
172
6.2 Concurrent MX Model Justification
with complete
Simulation
SMX
π ,θ ,ξ
consis. block schedules
with incompleteS ' rMIPS
π ,θ ,ξ
consis. block schedules
withS ' rMIPS
π ,θ ,ξ
arbitrary schedules
* 
Order reduction
with instantiatedS rMIPS
address space
Simple coupling 
simulation
with instantiatedSMIPS
address space
SB reduction
Tr
an
sf
er
 o
f O
w
ne
rs
hi
p 
Sa
fe
ty
, 
Pr
op
er
tie
s (
in
cl
ud
in
g 
no
t n
ee
de
d 
fo
r t
he
 si
m
ul
at
io
n)
,
an
d 
So
ftw
ar
e 
C
on
di
tio
ns
Figure 6.9: Justification of the concurrent MX implemented on multi-core MIPS-86. As a special
case of the properties transfer marked with * we consider obtaining the software conditions for
store buffer reduction from the well-behavior proven for the reduced MIPS-86 machine imple-
menting the concurrent MX semantics.
byte addresses at which the instruction resides in the memory are included into the reads-set by
definition.
Moreover, later for the application of the store buffer reduction we need the propertyPSrMIPS(E)
for E ∈ CSrMIPS . We have already required PSpi,θ,ξMX (C) for the complete consistency blocks of the
instantiated MX model with configurations C ∈ CSpi,θ,ξMX . Since there is a slight difference be-
tween S′pi,θ,ξrMIPS and SrMIPS, first we are interested in the transfer of the property PSpi,θ,ξMX (C) to a
similar one for S′pi,θ,ξrMIPS, i.e., for D ∈ CS′pi,θ,ξrMIPS
PS′pi,θ,ξrMIPS
(D)
def≡ Aguest ⊂ D.S ∧ ∀p ∈ Nnu. D.Op ∩Aguest = ∅
Since both PSpi,θ,ξMX (C) and PS′pi,θ,ξrMIPS(D) are divisible according to Definition 2.48 and they are
global properties, we easily prove the property transfer in a way stated in Definition 2.49.
173
6 Compiler Correctness and Justification of Concurrent Mixed Model
Lemma 6.2 (Transfer of the Property PSpi,θ,ξMX ).
∀D,C, par . sinv(D,C, par) =⇒
(
PS′pi,θ,ξrMIPS
(D) = PSpi,θ,ξMX
(C)
)
Proof: The proof follows directly from the shared invariant. Since the shared addresses of both
machines are equal, one easily concludes
(Aguest ⊂ D.S) = (Aguest ⊂ C.S) (6.1)
Moreover, using sinv(D,C, par).(iii) for any p ∈ Nnu we can transform
(D.Op ∩Aguest = ∅) =
(
(C.Op ∩Aguest = ∅) ∧
(
AstackMX (sbap,mssp) ∩Aguest = ∅
) )
(6.2)
We proceed with the rest of the proof in both directions. Taking into account the equation (6.1),
we need to discharge the following claims:
• (Aguest ⊂ C.S ∧ C.Op ∩Aguest = ∅) =⇒ D.Op ∩Aguest = ∅
By the equations (6.1) – (6.2) this claim to be proven is simplified to
Aguest ⊂ D.S =⇒ AstackMX (sbap,mssp) ∩Aguest = ∅
From sinv(D,C, par).(iv) we know that the stack cannot be shared, i.e.,AstackMX (sbap,mssp)∩
S = ∅ holds. Hence, using Aguest ⊂ D.S we conclude that it also does not overlap with
the guests/processes address space Aguest.
• D.Op ∩Aguest = ∅ =⇒ C.Op ∩Aguest = ∅
Using the equation (6.2) we easily conclude this claim and finish the proof of the lemma.
Along with the property PSpi,θ,ξMX for the MX Cosmos machine, one could be interested in the
transfer of any other abstract property P ′
Spi,θ,ξMX
such that the simulation hypothesis of the con-
current simulation theorem hold for start configurations C ∈ CSpi,θ,ξMX , D ∈ CS′pi,θ,ξrMIPS , and a
simulation parameter par ∈ infoT MX.
If this additional property P ′
Spi,θ,ξMX
is divisible and can be translated into the incompletely
simulated Cosmos machine property Q[P ′
Spi,θ,ξMX
, par ] wrt. Definitions 2.49 – 2.50, then applying
Theorem 2.6 and Lemmas 6.1 – 6.2 we conclude that any suitable Cosmos machine schedule
leaving D is safe wrt. not only the ownership and PS′pi,θ,ξrMIPS , but also the translated property
Q[P ′
Spi,θ,ξMX
, par ]. Moreover, all implementing computations of S′pi,θ,ξrMIPS are well-behaved.
Corollary 6.1 (Well-Behaviour and Safety Transfer for Arbitrary Schedules in MX Simula-
tion). Let the transferred property P˜S′pi,θ,ξrMIPS(d) be defined for d ∈ CS′pi,θ,ξrMIPS as
P˜S′pi,θ,ξrMIPS
(d) ≡W (d) ∧ PS′pi,θ,ξrMIPS(d) ∧Q[P
′
Spi,θ,ξMX
, par ](d)
then formally we claim
∀D,C, P ′
Spi,θ,ξMX
, par . simh(D,C, P ′
Spi,θ,ξMX
, par) =⇒ safety
(
D, P˜S′pi,θ,ξrMIPS
, suit
)
174
6.2 Concurrent MX Model Justification
where according to the shorthands from Section 2.3.2 and the instantiation of the sequential simulation
framework RSMXSrMIPS(pi, θ, ξ) the suitability suit(ω) for ω ∈ Θ∗S′pi,θ,ξrMIPS is
suit(ω) ≡ ∀β ∈ ω. RSMXS′rMIPS(pi, θ, ξ).suit(β.in)
6.2.4.2 Sketch for the Application of Store Buffer Reduction
Finally, in order to apply the store buffer reduction one has to consider an easy simulation
between the SB reduced model SrMIPS (in system mode) from Chapter 4 with the address space
instantiated in the MX software conditions and the SB reduced model S′pi,θ,ξrMIPS involved into
the concurrent MX machine simulation. The ownership safety policy including the property
P˜S′pi,θ,ξrMIPS
from Corollary 6.1 must be transferred. Moreover, software conditions required for the
store buffer reduction are derived from the well-bahaviour in S′pi,θ,ξrMIPS.
Both models slightly differ in the configuration and the instantiation of the IO-points. In the
store buffer reduction we treat any lw instruction as an IO-operation. On the other hand, the
SB machine considered in the concurrent MX simulation takes into account only those lw which
are marked as volatile accesses by the compiler. Since we require the ownership safety on the
level S′pi,θ,ξrMIPS having fewer IO-points, it is easy to prove that the safety holds also for SrMIPS.
Note, however, that we proved the store buffer reduction in a general case without restrictions
on the configurations and suitability. We also required that the software conditions and safety
holds for any executions of the reduced machine. As we have just shown by Corollary 6.1, we
can transfer such properties only for suitable schedules. Moreover, in the well-formedness for
S′pi,θ,ξrMIPS we require that all processors are in system mode and the store buffers are empty. Since
the application of the store buffer reduction considered in this thesis is rather a technical task,
we leave it for future work. The simulation and property transfer for the store buffer reduction
wrt. a more sophisticated programming policy can be found in [Che16].
175

7
Semantics and Correctness of
Concurrent Extended Mixed
Machine for MIPS-86
In Chapters 5–6 we introduced the mixed machine semantics and considered in detail its com-
piler correctness and the soundness of the concurrent MX model implemented on the multi-core
MIPS-86 machine where each processor executes the compiled code of programs without inline
assembly portions.
As we mentioned before, system programming, however, requires more low-level operations
than the MX semantics permits. Such operations can be implemented by using inline assembly
in the mixed programs. This chapter is devoted exactly to the development of the semantics
for this extended language and the justification of its concurrent version. We call the abstract
machine able to execute the mixed programs enriched with inline assembly the extended mixed
machine or simply MXA machine standing for MX+Assembly.
The idea of such a model, suggested by Prof. Wolfgang J. Paul and then studied indepen-
dently in [PBLS15] for sequential C0+Assembly, is very intuitive. In fact, since we have already
claimed the correctness of the sequential MX machine, one can easily consider the execution of
the abstract MX machine as long as its compiled code runs on the processor. When an inline
assembly portion is encountered in the MASM program, we can easily switch to the MIPS-86
configuration coupled by the MX compiler consistency relation. Therefore, one can continue
the execution on this level until an MX consistency point at which an abstract MX configuration
exists is reached. In order to get such an abstract configuration, one has to reconstruct a corre-
sponding MX thread configuration from the MIPS-86 configuration, the executed program, and
the compiler information.
Though the idea of the model considered in this thesis is similar to [PBLS15], the configura-
tion and the semantics of our MXA machine is different because of a few reasons along with the
MX language instead of C0. First of all, we have to justify the concurrent MXA model imple-
mented by arbitrary interleaved steps of the hypervisor / OS kernel and user processes / guests
respectively. For that purpose, we have to apply the theory of concurrent simulation and order
reduction for Cosmos machines, which, in turn, allows less freedom in all formal definitions to
match it. Another reason is the context in which our MXA model will be used. One should
mention here, that in contrast to [PBLS15] where only one stack for the OS kernel is considered,
we will argue about many stacks that can be added and deleted.
We proceed in the same way as it was done in the previous two chapters. Before we define
the MXA machine transition function, we consider the reconstruction formally in detail.
177
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
7.1 Sequential Extended Mixed Machine (MXA) Semantics
7.1.1 MXA Programs and Environment Parameters
Since in Section 5.1.1 we have already introduced MASM programs containing inline assembly,
we will consider the same MX programs pi = (piµ, picil) ∈ ProgMX here. Obviously, the C-IL
environment parameters θ ∈ ParamsCIL are also as before.
7.1.2 Machine Configuration
In the definition of the semantics for the concurrent mixed machine from Section 5.4 we intro-
duced the notion of the MX thread and its configuration. In the model of the extended mixed
machine such a thread can be additionally represented by a processor configuration.
Definition 7.1 (MXA Thread Configuration). A configuration of the MXA thread is either the
MIPS-86 processor configuration, or a tuple consisting of the MX thread configuration, base
address and maximal size of the current stack abstracted in this configuration, and the configu-
ration of the TLB.
KMXA
def≡ (kmx ∈ KMX, sba ∈ B32, mss ∈ N, tlb ∈ Ctlb) ∪ Cproc
Though the MX thread cannot explicitly operate on the TLB, its configuration is included as
a component so that it can be preserved during MX steps and later be used for the processor
configuration.
Definition 7.2 (Sequential MXA Configuration). Therefore, the sequential MXA machine con-
figuration is the MXA thread configuration accompanied by the byte addressable memory.
CMXA
def≡ (k ∈ KMXA, M : B32 → B8)
Definition 7.3 (Type of MXA Thread). Depending of the MXA thread configuration k ∈ KMXA,
we distinguish whether the MXA machine makes MX or processor steps by the predicates
isa(k)
def≡ k ∈ Cproc mx(k) def≡ ¬isa(k)
Definition 7.4 (MX and SB Reduced Single Core MIPS-86 Configurations from MXA Config-
uration). For a given configuration c ∈ CMXA performing mixed machine semantics steps, i.e.,
the predicate mx(c.k) holds, we can compose the corresponding sequential MX configuration
by the function
conf MXAMX (c)
def≡ confMX(c.k.kmx.ac, c.k.kmx.ic, c.k.kmx.spr, c.M)
Otherwise, if the MXA thread configuration is MIPS-86 ISA, one simply gets
conf MXArMIPS(c)
def≡ (c.k, c.M)
Definition 7.5 (Well-Formed MXA Configuration). We call an MXA configuration c ∈ CMXA
well-formed if its MXA thread configuration is well-formed, namely
wfconf pi,θ,cbaMXA (c)
def≡ (i) mx(c.k) =⇒ wfconf pi,θMX
(
conf MXAMX (c)
)
(ii) isa(c.k) ∧ ¬mode(c.k.core) =⇒ c.k.sb = ε
(iii) consiscodeMX (c.M, pi, info, cba)
178
7.1 Sequential Extended Mixed Machine (MXA) Semantics
with info = (infocil, infoµ) ∈ infoT MX, computed as infoµ = cmplMASM(piµ) and infocil =
cmplCIL(picil).
The last condition is added explicitly because we will treat the code region as read-only mem-
ory later on. Therefore, for the MXA semantics we have to require that the compiled code of the
MX program with inline assembly is always in the memory.
7.1.3 Stack Information Abstraction
The configuration of the extended mixed machine contains the stack base address and its max-
imal size when the mixed machine steps are performed. With the start of the inline assembly,
we havoc this information and allow the low-level operations on the stack to do what needed
for the system programming. As soon as the control reaches the compiled code of the C-IL and
MASM programs excluding inline assembly portions, the stack is supposed to be reconstructed.
Since the stack substitution could be made, we need to find a matching pair of the stack base
address and its length existing somewhere in system data structures in the memory.
Since on this level of abstraction we are not aware of concrete data structures containing the
information about stacks, e.g., process control block, thread local storage, etc, we will operate
here with general notions of stack information and stack information regions. In our interpretation
the stack information includes all pairs of base addresses and maximal sizes of all stacks present
in the system. In turn, a memory region containing the information about a single stack is called
the stack information region. Each such region is identified by its base address.
The stack information can be stored in the memory in different ways (e.g., linked lists, arrays,
etc.), as well as added and deleted by the programmer. Since we do not model such operations
in the extended MX semantics, we introduce a stack information abstraction allowing to retrieve
from the memory all available pairs of stack base addresses and maximal sizes every time they
are needed. For the application of the semantics for particular cases one has to instantiate this
abstraction wrt. its implementation.
Definition 7.6 (Stack Information Abstraction).
• Base address of the stack information in the memory is a starting address of the first stack
information region and given by the constant
StIba ∈ B32
• The address of the next stack information region is computed by the uninterpreted func-
tion
StIbanext(a,m) ∈ B32 ∪ {⊥}
on the base of the memory configuration m ∈ Cm and a base address a ∈ B32 of a given
stack information region. If the next region does not exist, the function returns ⊥.
• The partial functions retrieving the stack base address and maximal stack size from a
given stack information region identified by its base address are modeled as
sba(a,m) ∈ B32 mss(a,m) ∈ N
• Additionally to these parameters we introduce the constant
StIsize ∈ N
representing the size of a single stack information region in bytes. This size is not needed
for the semantics and will be later used for definition of conditions required for the cor-
rectness argumentation. Obviously, we assume that the addresses at which the sba and
mss reside lay inside the stack information region.
179
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
7.1.4 Base Address and Maximal Size of the Current Stack
In order to find a matching pair of the stack base address and its maximal size we proceed as
follows.
Definition 7.7 (Stack Information Computation). First, we recursively collect the stack infor-
mation starting from a given stack information region identified by an address a ∈ B32 ∪ {⊥}:
StI (a,m)
def≡
{
(ε, ε) : a = ⊥(
sbas ′ ◦ sba(a,m), msss ′ ◦mss(a,m)) : otherwise
with (sbas ′,mmss′) ≡ StI (StIbanext(a,m),m).
Obviously, for retrieving the information about all stacks available in the memory m one
calculates two sequences (sbas,msss) = StI (StIba,m) and requires that all found stacks in the
memory do not overlap by the predicate validstacksMX (|sbas|, sbas,msss) (see Section 6.2.1).
Second, if possible we choose the result by comparing a content of the stack and base pointer
registers of the MIPS-86 processor core.
Definition 7.8 (Search of Base Address and Maximal Size of the Current Stack). Given values
of the stack and base pointers spv, bpv ∈ B32 residing in the GPRs, the memory m ∈ Cm, and
the base address StIba of the stack information in m, we define the function
RStI (spv, bpv,m,StIba) ∈
(
B32 × N) ∪ {⊥}
such that on the base of the stack information (sbas,msss) = StI (StIba,m) and the set of pairs
StI
(sbas,msss)
pair (spv, bpv)
def≡
{
(sba,mss)
∣∣∣∣∣ (∃i ∈ N. sba = sbasi ∧mss = msssi) ∧spv, bpv ∈ AstackMX (sba,mss) ∧ 〈spv〉 ≤ 〈bpv〉
}
the result is computed as
RStI (spv, bpv,m,StIba)
def≡
{
 StI
(sbas,msss)
pair (spv, bpv) : StI
(sbas,msss)
pair (spv, bpv) 6= ∅
⊥ : otherwise
One can easily prove that if all found stacks in the memory do not overlap, one can find at
most one pair (sba,mss) corresponding to the given stack and base pointers.
Lemma 7.1 (Uniqueness of Found sba and mss). For any m, StIba , spv, bpv from above, and
computed non-empty (sbas,msss) = StI (StIba,m) one has
validstacksMX (|sbas|, sbas,msss) =⇒ #StI (sbas,msss)pair (spv, bpv) ≤ 1
7.1.5 MX Thread Configuration Reconstruction
Given a single core MIPS-86 configuration d ∈ CMIPS, the mixed program pi = (piµ, picil) ∈
ProgMX, the parameters θ ∈ ParamsCIL, the compiler information info = (infocil, infoµ) ∈
infoT MX, computed as infoµ = cmplMASM(piµ) and infocil = cmplCIL(picil), the stack base
address sba ∈ B32 and the code base addresses cba = (cbacil, cbaµ) ∈ B32 × B32 such that
180
7.1 Sequential Extended Mixed Machine (MXA) Semantics
validcodeMX (info, cba) holds, one can reconstruct a corresponding well-formed MX thread config-
uration of type KMX according to the MX compiler consistency.
Starting from the code address in the program counter and the frame base address from the
general purpose register bp, we traverse the physical stack residing in the MIPS-86 memory
from its top till we encounter the first its frame on its bottom. So, the reconstruction of the MX
thread includes the following steps:
• collect the base addresses of all frames on the stack and determine current locations and
functions/procedures which the frames belong to,
• using this control information traverse the stack in the memory and compose the abstract
full MX stack,
• transform the MX stack into a sequence of MX execution contexts and compose the MX
thread configuration.
Note, such a reconstruction is not always possible, e.g., in case an existing stack in the memory
was destroyed, or a system programmer prepared a new stack with a wrong layout. We will
indicate the result of the failed reconstruction with ⊥.
Reconstruction of Control Information
First, we introduce two simple predicates indicating that a given function/procedure is defined
in the C-IL or MASM program respectively:
masmp(p, piµ)
def≡ p ∈ dom (piµ) ∧ ¬ext(p, piµ)
cilf (f, picil, θ)
def≡ f ∈ dom (Fθpicil) ∧ ¬ext(f, picil, θ)
Definition 7.9 (Code Address in a MASM Procedure). Given a code address a ∈ B32 (corre-
sponding to the value of the program counter or the return address on the stack) and a pair
(p, loc) ∈ Pname × N of the MASM procedure name p and the location loc inside its body we
define a predicate indicating whether the address a corresponds to (p, loc):
procpiµ(p, loc, a, infoµ, cbaµ)
def≡ (i) masmp(p, piµ)
(ii) loc ∈ [1 : |piµ(p).body |]
(iii) a = caMASM
(
p, loc, infoµ, cbaµ
)
Definition 7.10 (Code Address in a C-IL Function). Analogously, for (f, loc) ∈ Fname × N we
define a predicate showing that the code address a ∈ B32 is either the starting address of the
compiled code of a C-IL statement at the location loc in the C-IL function f or the return address
pointing to the epilogue of a function call at the location1 loc − 1 in f :
funcpicil,θ(f, loc, a, infocil, cbacil)
def≡ (i) cilf (f, picil, θ)
(ii) loc ∈ [1 : |picil.F(f).P |]
(iii) a = caCIL (f, loc, infocil, cbacil) ∨
a = rcaCIL (f, loc − 1, infocil, cbacil)
1Recall that the location loc of non-topmost C-IL frames points to the next statement after the function call.
181
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
Definition 7.11 (Code Address in a Function/Procedure Compiled Code). Combining both
definitions for (f, loc) ∈ (Fname∪Pname)×N, info, cba from above, and a mixed machine program
pi we simply get
matchpi,θctrl(f, loc, a, info, cba)
def≡ procpiµ(f, loc, a, infoµ, cbaµ) ∨ funcpicil,θ(f, loc, a, infocil, cbacil)
In order to stop traversing the stack, we have to detect whether a frame base address points
to the bottom frame on the stack.
Definition 7.12 (Bottom Frame on C-IL Stack). Given the stack base address sba ∈ B32, and a
base address ba ∈ B32 of a stack frame of a non-external C-IL function f ∈ Fname, we can test
whether this frame is the bottom frame on the stack by the following comparison based on the
physical layout of the C-IL stack:
botfrpicil,θCIL (sba, ba, f)
def≡ 〈ba〉+ sizepicil,θpars (f) + 4 · 2− 1 = 〈sba〉
Definition 7.13 (Bottom Frame on MASM Stack). Analogously, for a non-external MASM pro-
cedure p ∈ Pname we define
botfr
piµ
MASM(sba, ba, p)
def≡ 〈ba〉+ 4 · piµ(p).npar + 4 · 2− 1 = 〈sba〉
Finally, we can introduce a recursive function collecting the frame base addresses and simul-
taneously reconstructing the control information for every frame.
Definition 7.14 (Control Information Reconstruction Function). For the MIPS-86 memorym ∈
Cm, frame base address ba ∈ B32, and a compiled code address ca ∈ B32 along with info, cba,
sba from above, we define the function
Rpi,θctrl(m, info, cba, sba, ca, ba) ∈
( (
B32
)∗ × ((Fname ∪ Pname)× N)∗ ) ∪ {⊥}
such that using the shorthands
Ctrl ≡ Ctrlpi,θ(ca, info, cba) def≡
{
(f ′, loc′)
∣∣∣matchpi,θctrl(f ′, loc′, ca, info, cba)}
ctrl ′ ≡ Rpi,θctrl (m, info, cba, sba, ra(m, ba), pbp(m, ba))
bfrcil ≡ botfrpicil,θCIL (sba, ba, f) bfrµ ≡ botfrpiµMASM(sba, ba, f)
the result is recursively computed as
Rpi,θctrl(m, info, cba, sba, ca, ba)
def≡
(ba, (f, loc)) : (f, loc) =  Ctrl ∧
(masmp(f, piµ) =⇒ bfrµ) ∧
(cilf (f, picil, θ) =⇒ bfrcil)(
bas ′ ◦ ba, flocs ′ ◦ (f, loc)) : (f, loc) =  Ctrl ∧
(masmp(f, piµ) =⇒ ¬bfrµ) ∧
(cilf (f, picil, θ) =⇒ ¬bfrcil) ∧
ctrl ′ = (bas ′,flocs ′)
⊥ : otherwise
Therefore, for a given configuration d ∈ CMIPS we start the reconstruction of the control infor-
mation from the values of the program counter and the base pointer in the GPRs:
Rpi,θctrl(d, info, cba, sba)
def≡ Rpi,θctrl(d.m, info, cba, sba, d.cpu.core.pc, d.cpu.core.gpr(bp))
182
7.1 Sequential Extended Mixed Machine (MXA) Semantics
Obviously, the reconstruction of the control information is unique if for any code address we
can find at most one pair (f, loc). We state this requirement in the following lemma:
Lemma 7.2 (Unique Control Information Reconstruction).
validcodeMX (info, cba) =⇒ ∀ca ∈ AcodeMX (info, cba). #Ctrlpi,θ(ca, info, cba) ≤ 1
The proof of this lemma is out of the scope of this thesis and can be made in the presence of
implementation details of the C-IL and MASM compiler. In fact, it should be easy to show
that for any address pointing to a MIPS-86 instruction inside the compiled code of a MASM
instruction or a C-IL statement except the function/procedure call, the set Ctrlpi,θ(ca, info, cba) is
empty. Otherwise, we consider either a starting address of an instruction/statement or a return
address pointing to the caller part of the epilogue, and find the only one (f, loc) corresponding
to this code address.
Definition 7.15 (Valid Reconstructed Control Information). For a reconstructed list flocs ∈
(Fname ∪ Pname)× N)+ we define the predicate validpi,θRctrl (flocs) ∈ B indicating whether flocs is
valid for the reconstruction of a corresponding MX thread configuration. Let the shorthands be
(f, loc) ≡ flocsi−1, (f ′, loc′) ≡ flocsi, stmtcil ≡ Fθpicil(f).P [loc − 1], instµ ≡ piµ(f).body [loc − 1].
Then we define validRctrl (flocs) such that it requires: (i) if f is a C-IL function, then at the location
loc− 1 it must contain a call of a function of the same type as f ′ 2, (ii) if f is a MASM procedure,
then f ′ is called explicitly at the same location, and (iii) for the C-IL function f the return value
type in stmt is the same as the return type of the C-IL function f ′:
validpi,θRctrl (flocs)
def≡ ∀i ∈ [2 : |flocs|].
(i) cilf (f, picil, θ) =⇒ stmtcil ∈ {e0 = call e(E), call e(E)} ∧ τpicil,θf (e) = τ
Fθpicil
fun (f
′)
(ii) masmp(f, piµ) =⇒ instµ = call f ′
(iii) cilf (f, picil, θ) ∧ stmtcil = (e0 = call f ′(E))∧
trds ∈ {ptr(t),array(t, n)} ∧ cilf (f ′, picil, θ) =⇒
t = qt2t
(Fθpicil(f ′).rettype)
Note that at least these requirements are needed for the reconstruction. Later we will have again
the software condition stating that the MX execution from the reconstructed configuration does
not cause run-time error. Therefore, all conditions present in the definition of the MX transition
function do not need to be checked during the reconstruction.
Reconstruction of the MX Stack
Having the reconstructed control information (bas,flocs) satisfying the predicate validpi,θRctrl (flocs),
we can proceed with the reconstruction of the full MX stack for the MX thread configuration.
In Definitions 6.6 and 6.17 we have already considered the computation of some stack frame
components from the memory. Now, we complete the computation of the missing ones.
2In fact, since in C-IL a function/procedure can be called by a function pointer, the only information needed for the
call and the later return to work correctly is the exact type of the function pointer which describes the parameters
and the return type. Therefore, in case the stack is built by a system programmer and is not a result of the execution
of the compiled mixed code, it is possible that the return from f ′ would be performed to the next statement after
stmt even if the call of f ′ is not present in pi.
183
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
Definition 7.16 (Reconstruction of rds). Given are a memory m ∈ Cm and ba, ba′ ∈ B32, f ∈
Fname , f ′ ∈ Fname ∪ Pname corresponding to the frame base addresses and function/procedure
names from Figure 6.6. Moreover, let the frame of the non-external C-IL function f have an
index i ∈ N and loc ∈ N be a location after the function/procedure call in the body of f ,
namely for the statement stmt ≡ Fθpicil(f).P [loc − 1] we have stmt ∈ {e0 = call e(E), call e(E)}.
Then using the shorthands a ≡ rdswpi,θ(m, ba′, f ′) and trds ≡ qt2t
(
τpi,θf (&(e0))
)
we define the
computation of the C-IL return value destination rdsval ∈ valptr ∪ val lref ∪ {⊥} in the frame i
by the function
rdspi,θ(m, ba, f, loc, i, ba′, f ′, info)
def≡ rdsval
such that the result depends on the following cases:
• No return value: stmt = call e(E)
Then, the result of the reconstruction of obviously computed as
rdsval = ⊥
Note that the C-IL semantics does not forbid such calls of f ′ even if the type of the function
return value is not void.
• Return value address corresponds to a local reference:
– the C-IL function / MASM procedure is called in the statement with return value
stmt = (e0 = call e(E))
– there exist a pair (v, o) of a local variable/parameter and an offset inside it such that
the byte address computed for (v, o) is equal to the destination address a read from
the stack.
Using the shorthands Vf ≡ Fθpicil(f).V and nparf ≡ Fθpicil(f).npar we define
existpicil,θlref (v, o, a, ba, f)
def≡ ∃ j ∈ [1 : |Vf |] , t ∈ TQ.
(i) (v, t) = Vf [j]
(ii) o ∈ [0 : sizeθ (qt2t(t))− 1]
(iii) j ≤ nparf =⇒ parapicil,θ(j, ba, f) +32 o32 = a
(iv) j > nparf =⇒ lva(v, ba, f, infocil) +32 o32 = a
and require
(v, o) = 
{
(vˆ, oˆ) ∈ V× N0
∣∣∣existpicil,θlref (vˆ, oˆ, a, ba, f)}
If the conditions hold, the result of the reconstruction is
rdsval = lref((v, o), i, trds)
• Return value address does not correspond to a local reference:
– the C-IL function / MASM procedure is called in the statement with return value
stmt = (e0 = call e(E))
184
7.1 Sequential Extended Mixed Machine (MXA) Semantics
– a local variable/ parameter is not found{
(vˆ, oˆ) ∈ V× N0
∣∣∣existpicil,θlref (vˆ, oˆ, a, ba, f)} = ∅
Under the conditions above we get
rdsval = val(a, trds)
Note that in this case a ∈ B32 might also be an address inside the code region or of a
word on the stack where the local variables and parameters of the frame i do not reside.
However, we will not be interested in such reconstructions later on because the software
conditions will require that the code and stack regions are not explicitly accessed by the
MX machine.
Now, using the following shorthands
• control information ctrl ≡ Rpi,θctrl(d, info, cba, sba)
• for non-empty ctrl = (bas,flocs), top ≡ |bas|, and i ∈ N|ctrl|:
bai ≡ basi (fi, loci) ≡ flocsi
• for a C-IL function fi:
Vi ≡ Fθpicil(fi).V npar cili ≡ Fθpicil(fi).npar (vi,j , ti,j) ≡ Vi [j]
ri,j ≡ infocil.reglvar (vi,j , fi, loci) crrai,j ≡ crrapicil,θ (ri,j , bai, fi, loci − 1, infocil)
• for a MASM procedure fi:
usesi ≡ piµ(fi).uses nparµi ≡ piµ(fi).npar
we can define the reconstruction of the full MX stack.
Definition 7.17 (MX Stack Reconstruction Starting from a Given Frame). For the considered
d, info, (bas,flocs) such that flocs is valid for the reconstruction, and an index i ∈ [0 : top] we
define the auxiliary stack reconstruction function
Rˆpi,θstack(d, info, bas,flocs, i) ∈ (frameMASM ∪ frameCIL)∗
Rˆpi,θstack(d, info, bas,flocs, i)
def≡
{
ε : i = 0
Rˆpi,θstack(d, info, bas,flocs, i− 1) ◦ frameimx : otherwise
where the frame frameimx is reconstructed as follows:
frameimx ≡
{
frameicil : cilf (fi, picil, θ)
frameiµ : masmp(fi, piµ)
• MASM stack frame frameiµ:
frameiµ.p = fi frame
i
µ.pars = pars
piµ
MASM (d.m, bai, fi)
frameiµ.loc = loci frame
i
µ.saved = saved
piµ
MASM (d.m, bai, fi)
185
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
In contrast to the frame components restored above on the base of the procedure infor-
mation, the size of the lifo has to be computed from the actual layout of the stack in the
memory:
size lifo ≡

〈bai〉 − 〈d.cpu.core.gpr(sp)〉 − 4 · |usesi| : i = top
dist − sizepicil,θpars (fi+1) : i < top ∧ cilf (fi+1, picil, θ)
dist − 4 · nparµi+1 : i < top ∧masmp(fi+1, piµ)
where
dist ≡ 〈bai〉 − 〈bai+1〉 − 4 · |usesi| − 2 · 4
Note that size lifo cannot be correctly computed if the stack is prepared by a program-
mer in a wrong way. For the time being we ignore such situations and set empty lifo
for size lifo < 0. Later after the stack reconstruction we will introduce another validity
condition intended to exclude such issues. So, by now we compute
frameiµ.lifo =
{
lifo
piµ
MASM (d.m, bai, fi, size lifo/4) : size lifo ≥ 0
ε : otherwise
• C-IL stack frame frameicil:
frameicil.f = fi frame
i
cil.loc = loci
frameicil.rds =
{
rdspi,θ(d.m, bai, fi, loci, i, bai+1, fi+1, info) : i < top
⊥ : otherwise
In order to reconstruct the local memory we simply chose the values for the local vari-
ables/parameters vi,j according to the compiler consistency relation:
frameicil.ME (vi,j) =
d.cpu.core.gpr(ri,j) : i = top ∧ ri,j 6= ⊥
gprcallee (d.m, bai+1, fi+1) (ri,j) : i < top ∧ ri,j ∈ Regcallee ∧ cilf (fi+1, picil, θ)
saved
piµ
MASM (d.m, bai+1, fi+1) (ri,j) : i < top ∧ ri,j ∈ Regcallee ∧masmp(fi+1, piµ)
d.m4 (crrai,j) : i < top ∧ ri,j ∈ Regcaller
d.msizeθ(qt2t(ti,j))
(
parapicil,θ(j, bai, fi)
)
: ri,j = ⊥ ∧ j ≤ npar cili
d.msizeθ(qt2t(ti,j)) (lva(vi,j , bai, fi, infocil)) : ri,j = ⊥ ∧ j > npar cili
Definition 7.18 (Full MX Stack Reconstruction). Finally, for the full MX stack reconstruction
we define the function wrt. the shorthands introduced above.
Rpi,θstack(d, info, cba, sba) ∈ (frameMASM ∪ frameCIL)∗ ∪ {⊥}
Rpi,θstack(d, info, cba, sba)
def≡

Rˆpi,θstack(d, info, bas,flocs, top) : ctrl = (bas,flocs)∧
validpi,θRctrl (flocs)
⊥ : otherwise
As we mentioned before, in the reconstruction of frames we mostly rely on function/proce-
dure declarations and the compiler information. The only exception is the computation of lifo,
186
7.1 Sequential Extended Mixed Machine (MXA) Semantics
len 
frames 
...
...
(i -1)-th frame
i-th frame
MASM 
frames 
... C-IL 
frames
...
(i - len+1)-th frame
Figure 7.1: C-IL context reconstruction from the full MX stack.
though we have ignored the wrong stack layout yet that could be created by a system program-
mer. In fact, during the stack reconstruction we have not checked whether each frame of the
stack in the memory has a size proper for the further execution of the compiled code. Without
such a condition one could get the reconstructed components with values taken from adjacent
stack frames. We formulate this validity condition using the earlier introduced notion of the
distance between frame base addresses.
Definition 7.19 (Valid Reconstruction of MX Stack). We call a reconstructed MX stack st =
Rpi,θstack(d, info, cba, sba) with st 6= ⊥ valid wrt. the valid control information (bas,flocs) =
Rpi,θctrl(d, info, cba, sba) iff the following predicate holds:
validpi,θRstack (st, bas, d, info)
def≡ ∀i ∈ Ntop .
distpi,θMX(st, i, info) =
{
〈bai〉 − 〈bai+1〉 : i < top
〈bai〉 − 〈d.cpu.core.gpr(sp)〉 : otherwise
Reconstruction of Execution Contexts and MX Thread Configuration
In order to reconstruct the sequence of the MX execution contexts on the base of the recon-
structed stack we introduce an auxiliary recursive function traversing the stack from a frame
with a given index.
Definition 7.20 (Reconstruction of the MX Contexts Starting from a Given Frame). Given a
reconstructed non-empty MX stack st ∈ (frameMASM ∪ frameCIL)∗, a sequence of reconstructed
frame base addresses bas ∈ (B32)∗, a memory m ∈ Cm, general purpose registers gpr : B8 →
B32, and an index i ∈ [0 : top(st)] of the frame from which the reconstruction is performed, we
define the partial function
Rˆpi,θcntx (st, bas,m, gpr, i) ∈
(
contextCIL ∪ contextactiveMASM ∪ context inactiveMASM
)∗
such that using the shorthands:
• the C-IL or MASM stack of an individual context:
s ≡ st[i− len + 1 : i]
187
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
such that all its adjacent frames are of the same type, what is taken into account in the
computation of its length (see example for a C-IL frame with the index i on Figure 7.1)
len ≡ max
{
l ∈ Ni
∣∣∣∣∣ ∀j ∈ [i− l + 1 : i]. (cil(st, i) =⇒ cil(st, j)) ∧(masm(st, i) =⇒ masm(st, j))
}
• the sequence of reconstructed contexts starting from the frame i− len :
k ≡ Rˆcntx (st, bas,m, gpr, i− len)
the result is computed as
Rˆpi,θcntx (st, bas,m, gpr, i)
def≡

ε : i = 0
k ◦ s : i > 0 ∧ cil(st, i)
k ◦ (s, gpr′) : i = top(st) ∧masm(st, i)
k ◦ (s, gpr′′) : 0 < i < top(st)∧
masm(st, i) ∧ cil(st, i+ 1)
undefined : otherwise
where the GPRs of active and inactive MASM contexts are
gpr′ ≡ gpr|B8\{sp,bp,ra} gpr′′ ≡ gprcalleepicil,θ(m, basi+1, fi+1(st))
Definition 7.21 (Full Reconstruction of MX Execution Contexts). Hence, using the definition
above we reconstruct the whole sequence of the MX execution contexts by the function
Rpi,θcntx (d, info, cba, sba) ∈
(
contextCIL ∪ contextactiveMASM ∪ context inactiveMASM
)∗ ∪ {⊥}
defined for the shorthands
ctrl ≡ Rpi,θctrl(d, info, cba, sba)
st ≡ Rpi,θstack(d, info, cba, sba)
k ≡ Rˆpi,θcntx (st, bas, d.m, d.cpu.core.gpr, top(st))
as follows
Rpi,θcntx (d, info, cba, sba)
def≡

k : st 6= ⊥ ∧ ctrl = (bas,flocs)∧
validpi,θRstack (st, bas, d, info)
⊥ : otherwise
As the last step of the reconstruction, we easily construct the MX thread configuration if
possible. Obviously, the last element in the computed sequence of contexts correspond to the
active execution context.
Definition 7.22 (Reconstruction of the MX Thread Configuration). The function for the MX
thread configuration reconstruction is defined as
Rpi,θKMX(d, info, cba, sba) ∈ KMX⊥
Rpi,θKMX(d, info, cba, sba)
def≡
{
(ac, ic, d.cpu.core.spr) : Rpi,θcntx (d, info, cba, sba) = ic ◦ ac
⊥ : otherwise
such that the active context is ac ∈ contextCIL ∪ contextactiveMASM, and the sub-sequence ic corre-
sponds to the list of inactive contexts and can be empty.
188
7.1 Sequential Extended Mixed Machine (MXA) Semantics
7.1.6 Transition Function
In order to define the transition function for the extended mixed machine, one should know
when the MXA machine starts the inline assembly, and when it should be possible to return
back to a corresponding MX thread configuration with the abstract stack.
Definition 7.23 (Start of Inline Assembly). The predicate startpiisa(c) indicates that the MXA
machine with a configuration c ∈ CMXA is about to start execution of inline assembly in the
program pi:
startpiisa(c)
def≡ mx(c.k) ∧masm(cmx.ac) ∧ ∃il ∈ I+ASM. instrnext(cµ, piµ) = asm{il}
where the shorthand are cmx ≡ conf MXAMX (c) and cµ ≡ confMASM(cmx).
Definition 7.24 (End of ISA Steps). Conversely, one can attempt to apply the reconstruction
for the MXA machine with an MXA thread k ∈ KMXA represented by the MIPS-86 processor
configuration if the following predicate holds:
endpi,θ,cbaisa (k)
def≡ isa(k) ∧ ¬mode(k.core) ∧ cpMXrMIPS(k.core, pi, info, θ, cba)
Obviously, we will use here and further in this section the information info = (infocil, infoµ) ∈
infoT MX, computed as before, i.e, infoµ = cmplMASM(piµ) and infocil = cmplCIL(picil).
Note that as we have seen before in the definition of the reconstruction, endpi,θ,cbaisa (k) only
says when we can apply it. Though we may reach a consistency point, it is not always possible
to return to the abstract MX thread configuration and one needs to continue the ISA execution
until one reaches another consistency point at which the reconstruction is possible. This issue
and the solution were discovered by the author of this thesis in the time when the original
version of C0 semantics with inline assembly [PBLS15] based on the reconstruction did not take
this fact into account and, therefore, had to be corrected later.
Definition 7.25 (Stack Addresses Occupied by the Abstract Stack). Additionally, for a given
MXA configuration c ∈ CMXA such that mx(c.k) holds, we define a set of memory addresses
that actually should be occupied by the abstract stack of the configuration c:
Api,θstack(c, info, sba)
def≡ {sptop}〈sba〉−〈sptop〉+1
witch sptop ≡ sppi,θtop (stMX(cmx), info, sba) and cmx ≡ conf MXAMX (c).
Definition 7.26 (Sequential MXA Transition Function). Finally, for a given MX program pi ∈
ProgMX with inline assembly such that pi = (piµ, picil), the environment parameters θ ∈ ParamsCIL,
the code base addresses cba = (cbacil, cbaµ) ∈ B32 × B32, and the base address of the stack in-
formation StIba ∈ B32, combined together into ι = (cba,StIba), the transitions of the sequential
extended mixed machine are defined by the partial function
δpi,θ,ιMXA : CMXA × ΣMXA ⇀ CMXA⊥
where CMXA⊥
def≡ CMXA ∪ {⊥} contains the error state, coming from the mixed machine
semantics. The input alphabet
ΣMXA
def≡ ΣMX ∪ ΣA ∪ ΣMIPS
189
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
ΣA
def≡ (pc ∈ B32, gpr : {sp, bp, ra} → B32, mst : B32 ⇀ B8)
includes the inputs needed for the MX machine step, or the single core MIPS-86 ISA, or, in case
of switching to inline assembly, the parts of MIPS-86 configuration not modeled in the mixed
machine configuration.
Definition 7.27. For constructing a single core MIPS-86 machine configuration from c ∈ CMXA
with startpiisa(c) holding in c, and in ∈ ΣA we define the function conf startrMIPS(c, in) ∈ CMIPS such
that
conf startrMIPS(c, in)
def≡ d
d.cpu.sb = ε d.cpu.tlb = c.k.tlb
d.cpu.core = (in.pc, gpr, c.k.kmx.spr)
gpr(r) =
{
c.k.kmx.ac.gpr(r) : r /∈ {sp, bp, ra}
in.gpr(r) : otherwise
d.m(a) =
{
c.M(a) : a /∈ dom (in.mst)
in.mst(a) : otherwise
Then, depending on the MXA machine configuration, for c ∈ CMXA, in ∈ ΣMXA we consider:
• Mixed machine step:
mx(c.k) ∧ in ∈ ΣMX ∧ ¬startpiisa(c)
We perform the step of the corresponding MX machine
cmx ≡ conf MXAMX (c) c′mx ≡ δpi,θMX(cmx, in)
and compose the new configuration of the MX thread if c′mx 6= ⊥:
k′mx ≡ (c′mx.ac, c′mx.ic, c′mx.spr)
Hence, the result of the MXA machine step is defined as
δpi,θ,ιMXA(c, in)
def≡
{(
c.k[kmx := k
′
mx], c
′
mx.M
)
: c′mx 6= ⊥
⊥ : otherwise
• Switch to inline assembly:
– inline assembly code is to be executed: startpiisa(c)
– the input is in ∈ ΣA
– the input memory corresponds to the actual stack region occupied by the abstract
stack
dom (in.mst) = A
pi,θ
stack(c, info, c.k.sba)
– The input parameters for the step satisfy the compiler consistency, namely, the pred-
icate suitpi,θ,cbastart (c, in, info) defined using d ≡ conf startrMIPS(c, in) and cmx ≡ conf MXAMX (c)
holds:
suitpi,θ,cbastart (c, in, info)
def≡
(i) consisctrlMX(cmx, d, pi, θ, info, cba, c.k.sba)
(ii) consisbp/spMX (cmx, d, pi, θ, info, c.k.sba)
(iii) consisregMX(cmx, d, pi, θ, info, c.k.sba)
(iv) consisstackMX (cmx, d, pi, θ, info, c.k.sba)
190
7.1 Sequential Extended Mixed Machine (MXA) Semantics
Note that the input is not unique because of the general purpose register ra not covered
by the MX compiler consistency and temporaries on the physical C-IL stack which are
purely used by the compiler and have no counterparts in the abstract C-IL configuration.
Since values of ra and temporaries are non-deterministic here, any program using the
MXA semantics has to be written in a way such that it works for any of them.
In order to define the result of the MXA step in this case we first compute the result of the
execution of the first inline assembly instruction
d′ ≡ δrMIPS
(
d,
(
core, wI , wD, 0
256
))
where wI = wD =  Cwalk are just ignored in the transition function.
Since the inline assembly may contain a single instruction or a jump to a compiled code
where the reconstruction might be possible, namely endpi,θ,cbaisa (d
′.cpu) holds, we first try
to find
– a new pair of the stack base address and stack maximal size
StI ′ ≡ RStI (gpr′(sp), gpr′(bp), d′.m,StIba)
with gpr′ ≡ d′.cpu.core.gpr.
– a reconstructed MX thread configuration for StI ′ = (sba′,mss′)
k′mx ≡ Rpi,θKMX(d′, info, cba, sba′)
– the new MXA thread configuration for k′mx 6= ⊥
k′ ≡ (k′mx, sba′, mss′, d′.cpu.tlb)
Hence, the transition function for this case is defined as
δpi,θ,ιMXA(c, in)
def≡

(k′, d′.m) : endpi,θ,cbaisa (d
′.cpu)∧
StI ′ = (sba′,mss′) ∧ k′mx 6= ⊥
(d′.cpu, d′.m) : otherwise
(7.1)
• Inline assembly, compiled code, or guest/process step:
– the MXA thread is represented by the ISA configuration: isa(c.k)
– the input matches the step:
in ∈ ΣMIPS ∧ (in = (core, wI , wD, eev) =⇒ ¬eev[0])
– the single core MIPS-86 transition is defined for some d′:
δrMIPS
(
conf MXArMIPS(c), in
)
= d′
Then, using the shorthands StI ′, k′mx, and k′ from above, we define δ
pi,θ,ι
MXA(c, in) by the
same equation (7.1).
In all other cases the transition function δpi,θ,ιMXA(c, in) is undefined.
191
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
7.2 Concurrent Extended Mixed Machine Semantics
Analogously to the concurrent MX semantics in Section 5.4, we define the concurrent MXA
machine.
Definition 7.28 (Configuration of Concurrent MXA Machine). Configurations of the concur-
rent extended mixed machine with nt ∈ N MXA threads are defined by the set
CcMXA
def≡ (k : Nnt → KMXA, M : B32 → B8)
Definition 7.29 (Stack Information Abstraction for Concurrent MXA Machine).
• For every MXA thread with index Nnt we consider a distinct base address of the corre-
sponding stack information, namely, we have the sequence
StIbas ∈ (B32)nt
• All other uninterpreted functions StIbanext , sba, mss and the constant StIsize treated as
parameters for the MXA model are common for all threads.
Definition 7.30 (Sequential MXA Configuration from Concurrent MXA). For a given concur-
rent MXA configuration c ∈ CcMXA and t ∈ Nnt we define confMXA(c, t) ∈ CMXA such that
confMXA(c, t)
def≡ (c.k(t), c.M)
Definition 7.31 (Concurrent MXA Transition Function). Now, for a given MX program pi ∈
ProgMX with inline assembly, the environment parameters θ ∈ ParamsCIL, and code base ad-
dresses cba = (cbacil, cbaµ) ∈ B32 ×B32 determined by the linker placing the compiled program
pi into the memory, the transitions of the concurrent extended mixed machine are defined by
the function
δpi,θ,cbacMXA : CcMXA × Nnt × ΣMXA ⇀ CcMXA⊥
such that for a configuration c ∈ CcMXA, an index t ∈ Nnt of an MXA thread performing a step,
and an input in ∈ ΣMXA, the result of the transition is defined as
δpi,θ,cbacMXA (c, t, in)
def≡
{
(c.k[t 7→ c′t.k], c′t.M) : c′t 6= ⊥
⊥ : otherwise
where the next sequential MXA configuration c′t is computed for ιt ≡ (cba,StIbast) as
c′t ≡ δpi,θ,ιtMXA (confMXA(c, t), in)
If c′t does not exist (recall that the sequential MXA function is partial), then the result of the step
δpi,θ,cbacMXA (c, t, in) is undefined.
Obviously, in the semantics we require that any stepping MXA thread t has a well-formed
configuration:
wfconf pi,θ,cbaMXA (confMXA(c, t))
192
7.3 Compiler Correctness for the Sequential Extended Mixed Machine
7.3 Compiler Correctness for the Sequential Extended Mixed
Machine
Similarly to Chapter 6 we proceed with the justification of the MXA machine encoded by the
SB reduced MIPS-86. First, one has to consider a sequential case adapted for further concurrent
context.
7.3.1 Sequential Simulation Relation
Since the MXA machine performs either MX, or MIPS-86 steps, the simulation relation is either
the MX compiler consistency, or the coupling of MIPS-86 components such that the compiled
code resides in the memory. Again, in order to apply this simulation in the presence of envi-
ronment steps, we have to exclude the region of possibly inconsistent memory. Recall, that he
memory inconsistency might be caused by MX steps of another core that has not reached an
MX consistency point, or by writing to a store buffer of another processor in the non-reduced
ISA machine.
Definition 7.32. For any sequential MXA machine configuration c ∈ CMXA, a SB reduced MIPS-
86 configuration d ∈ CMIPS, code base addresses cba = (cbacil, cbaµ) ∈ B32×B32, compiler infor-
mation info = (infocil, infoµ) ∈ infoT MX, and memory addresses icm ⊂ Ahyp \ AcodeMX (info, cba)
we define the sequential MXA simulation relation as
consisMXA(c, d, pi, θ, info, cba, icm)
def≡
(i) mx(c.k) =⇒ consisMX
(
conf MXAMX (c), d, pi, θ, info, cba, c.k.sba, c.k.mss, icm
)
(ii) isa(c.k) =⇒ d.cpu = c.k ∧ ∀a ∈ B32 \ icm. d.m(a) = c.M(a)
7.3.2 Consistency Points
In order to define the consistency points for the MXA machine, and the SB reduced MIPS-86
executing its compiled code, we first define the predicates indicating which code is executed:
compiled code of hypervisor / OS kernel MXA program, inline assembly portion of this code,
or some compiled code not belonging to the considered MXA program. In the last case the code
may belong to user processes / guests, or to compiled libraries used by hypervisor / OS kernel.
Definition 7.33 (Addresses of Instructions in Inline Assembly). The set of addresses of in-
structions in inline assembly portions of a MASM program piµ which compiled code contained
in infoµ is placed in the memory at the code base address cbaµ is computed by the function
AinlineMASM(piµ, infoµ, cbaµ)
def≡

adrloc,i
∣∣∣∣∣∣∣∣∣∣∣∣
p ∈ dom (piµ) ∧
¬ext(p, piµ)∧
loc ∈ [1 : |piµ(p).body |]∧
piµ(p).body [loc] = asm{il}∧
i ∈ [1 : |il| − 1]

where adrloc,i ≡ caMASM(p, loc, infoµ, cbaµ) +32 (4 · i)32 is an address of an instruction in the
inline assembly sequence il.
Note here that we exclude the address of the first instruction in il because it corresponds
exactly to the starting addressed of the compiled asm{il}.
193
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
MX steps Inline assembly
cpMX cpMX
MXA
MIPS
Execution of a compiled code
cpMX
MX steps
cpMX cpMX
cprMIPS
MX cprMIPS
MX cprMIPS
MX cprMIPS
MX cprMIPS
MX
Block implementing
 abstract MX steps
Blocks not implementing
 abstract MX steps
co
ns
is
M
X
A
Figure 7.2: Consistency points for MXA machine and MIPS implementing it. The dots depict
the consistency points chosen for both machines such that the unnamed points are additional
ones along with those inserted by the MX compiler. The arrows between the machines represent
the consistency relation where the shaded dashed ones denote that the coupling holds but it is
out of our interest.
Definition 7.34 (Processor Runs Inline Assembly). Then, we define that the processor core
with a configuration core ∈ Ccore executes inline assembly by
inline(core, piµ, infoµ, cbaµ)
def≡ core.pc ∈ AinlineMASM(piµ, infoµ, cbaµ)
Definition 7.35 (Processor Runs Inside/Outside Hypervisor/OS Kernel Code). Similarly, we
test whether the core runs the compiled code of hypervisor / OS kernel MXA program or not
by the predicates
inside(core, info, cba)
def≡ core.pc ∈ AcodeMX (info, cba)
outside(core, info, cba)
def≡ ¬inside(core, info, cba)
Now, we can choose consistency points for the extended mixed machine. These consistency
points will be treated as interleaving points when we apply the order reduction for the arbi-
trarily interleaved steps of MIPS implementing our MXA machine and consider the concurrent
simulation. On first sight, when the extended machine makes MX steps, one would like to con-
sider the MX consistency points. In all other cases, since we would be able to couple the MIPS
configuration present in the MXA machine with the one executing the code, one might wish to
have this consistency in every step. A corresponding computation of consistency points must
be also applied on the MIPS-86 level. However, things are more complex.
In fact, reaching an MX consistency point does not guarantee that we manage to reconstruct
a corresponding MX thread configuration. A typical case will be considered later when we
switch to a newly created stack where a return address residing on the stack is the starting
address of a callee’s epilogues instead of a caller’s prologue. In this case one continues MIPS-86
ISA steps until we reach another consistency point and try to apply the reconstruction again
(see the execution of the compiled code on Figure 7.2).
If one would like to have a consistency point at every step in both machines during the exe-
cution of the compiled code not implementing abstract MX steps, it would be possible to detect
194
7.3 Compiler Correctness for the Sequential Extended Mixed Machine
it from the MXA thread configuration, but not in the implementing MIPS-86 machine. The rea-
son is that one would need to distinguish between steps in the blocks implementing and not
implementing MX steps. In contrast to the latter case, in the former case we have to consider
only the points inserted by the compiler. In order to detect the case purely on the MIPS-86
level, one has to test whether the reconstruction is possible and keep a ghost history indicating
whether we still implement the abstraction, what, in turn, requires to take into account a mem-
ory region occupied by the stack. Using such memory addresses for detecting the interleaving
and consistency points, however, is not allowed by the Cosmos model and sequential simulation
framework from Chapter 2.
Therefore, we will require the simulation relation to hold at every MIPS-86 step only when
the processor executes inline assembly or some code outside of our compiled program. This
means that even if a corresponding abstract MX steps does not exist during the execution of
the compiled code, we will still use only the consistency points inserted by the compiler and
consider the simulation at these points (see non-dashed arrows on Figure 7.2) though the trivial
coupling for the abstract and implementing machines is possible in between.
Definition 7.36 (Consistency Points for MXA Machine). Given an MXA thread configuration
k ∈ KMXA and pi, info, θ, cba, we define whether the corresponding MXA machine is at the
consistency point by the predicate
cpMXA(k, pi, info, θ, cba)
def≡
(i) mx(k) =⇒ cpMX(k.kmx.ac, info)
(ii) isa(k) =⇒ outside(k.core, info, cba)∨
inline(k.core, piµ, infoµ, cbaµ)∨
cpMXrMIPS(k.core, pi, info, θ, cba)
For a configuration c ∈ CMXA we simply reload the definition for brevity as
cpMXA(c, pi, info, θ, cba)
def≡ cpMXA(c.k, pi, info, θ, cba)
Definition 7.37 (Consistency Points for MIPS-86 ISA Executing MXA). Analogously, for the
reduced MIPS-86 machine with a processor core configuration core ∈ Ccore we define
cpMXArMIPS(core, pi, info, θ, cba)
def≡ outside(core, info, cba)∨
inline(core, piµ, infoµ, cbaµ)∨
cpMXrMIPS(core, pi, info, θ, cba)
Then, for d ∈ CMIPS we rewrite
cpMXArMIPS(d, pi, info, θ, cba)
def≡ cpMXArMIPS(d.cpu.core, pi, info, θ, cba)
7.3.3 Accessed Addresses
Definition 7.38 (Memory Addresses for Stack Information Retrieving). Similarly to the com-
putation of the stack information in Definition 7.7 for a given a ∈ B32 ∪ {⊥} we recursively
collect all byte addresses occupied by stack information regions
AStI (a,m)
def≡
{
∅ : a = ⊥ ∨ a = 032
{a}StIsize ∪AStI (StIbanext(a,m),m) : otherwise
Therefore, AStI (StIba,m) contains addresses belonging to all stack information regions.
195
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
Definition 7.39 (Memory Addresses Accessed for Reading and Writing during an MXA Ma-
chine Step). For a configuration c ∈ CMXA and input in ∈ ΣMXA such that the extended ma-
chine step δpi,θ,ιMXA(c, in) with ι ≡ (cba,StIba) is defined we compute sets readspi,θMXA(c, in, cba,StIba),
writespi,θMXA(c, in, cba,StIba) of addresses at which the memory is read and written during the
MXA step. So, using the shorthands
cmx ≡ conf MXAMX (c) d ≡ conf MXArMIPS(c) d′ ≡ δrMIPS(d, in)
d̂ ≡ conf startrMIPS(c, in) în ≡
(
core,  Cwalk,  Cwalk, 0256
)
d̂′ ≡ δrMIPS(d̂, în)
and for any x ∈ CMIPS with gprx ≡ x.cpu.core.gpr
sti(x,StIba) ≡ RStI (gprx(sp), gprx(bp), x.m,StIba)
the set readspi,θMXA(c, in, cba,StIba) is obtained as (i) the MX reads-set for a mixed machine step,
and (ii) in case of the switch to inline assembly or a pure ISA step it includes (a) the reads-set
of the MIPS-86 model if there is no end of ISA steps, (b) this reads-set together with addresses
for stack information if the end of ISA steps is reached and a matching base address of the stack
and its maximal size are not found, and (c) if they are, readspi,θMXA(c, in, cba,StIba) is computed
as the MIPS-86 reads-set together with addresses for stack information and the stack:
readspi,θMXA(c, in, cba,StIba)
def≡
readsMX (cmx, pi, θ) : mx(c.k) ∧ ¬startpiisa(c)
readsMIPS
(
d̂, în
)
: startpiisa(c) ∧ ¬endpi,θ,cbaisa (d̂′.cpu)
readsMIPS
(
d̂, în
)
∪AStI (StIba, d̂′.m) : startpiisa(c) ∧ endpi,θ,cbaisa (d̂′.cpu)∧
sti(d̂′,StIba) = ⊥
readsMIPS
(
d̂, în
)
∪AStI (StIba, d̂′.m) ∪AstackMX (ŝba′, m̂ss′) : startpiisa(c) ∧ endpi,θ,cbaisa (d̂′.cpu)∧
sti(d̂′,StIba) = (ŝba′, m̂ss′)
readsMIPS (d, in) : isa(c.k) ∧ ¬endpi,θ,cbaisa (d′.cpu)
readsMIPS (d, in) ∪AStI (StIba, d′.m) : isa(c.k) ∧ endpi,θ,cbaisa (d′.cpu)∧
sti(d′,StIba) = ⊥
readsMIPS (d, in) ∪AStI (StIba, d′.m) ∪AstackMX (sba′,mss′) : isa(c.k) ∧ endpi,θ,cbaisa (d′.cpu)∧
sti(d′,StIba) = (sba′,mss′)
Note that the definition of readspi,θMXA(c, in, cba,StIba) depends on the intermediate MIPS-86 step
because it can change the stack information in the memory.3
The set writespi,θMXA(c, in) is defined as (i) the MX writes-set for a mixed machine step, (ii)
the MIPS-86 writes-set together with the set of stack addresses4 in case of the switch to inline
3In order to compute readspi,θ,cbaMXA (c, in) only on the base of the input configuration, one would need to introduce
software conditions requiring that the stack information in the memory is not changed during the step before a
possible reconstruction:
isa(c.k) ∧ endpi,θ,cbaisa (d′.cpu) =⇒ AStI (StIba, d.m) = AStI (StIba, d′.m)
startpiisa(c) ∧ endpi,θ,cbaisa (d̂′.cpu) =⇒ AStI (StIba, d̂.m) = AStI (StIba, d̂′.m)
4Recall that during the switch to inline assembly the memory region occupied by the stack is provided by the input
for the MXA step and put into the memory of the MXA machine.
196
7.3 Compiler Correctness for the Sequential Extended Mixed Machine
assembly, and (iii) the MIPS-86 writes-set for a pure ISA step:
writespi,θMXA(c, in)
def≡
writesMX (cmx, pi, θ) : mx(c.k) ∧ ¬startpiisa(c)
Api,θstack(c, info, c.k.sba) ∪ writesMIPS
(
d̂, în
)
: startpiisa(c)
writesMIPS (d, in) : isa(c.k)
where the compiler information info is computed as before.
Definition 7.40 (No Access to icm by MXA Machine). We combine both sets into
accadpi,θMXA(c, in, cba,StIba)
def≡ readspi,θMXA(c, in, cba,StIba) ∪ writespi,θMXA(c, in)
and define a predicate indicating that addresses icm are not accessed
noaccpi,θMXA(c, in, icm, cba,StIba)
def≡ accadpi,θMXA(c, in, cba,StIba) ∩ icm = ∅
7.3.4 IO- and OT -Points
Definition 7.41 (IO- and OT -Points for MXA Machine). Using the same shorthands from
Definition 7.39, namely,
d̂ ≡ conf startrMIPS(c, in) d ≡ conf MXArMIPS(c)
we define the MXA IO-points as
IOpi,θMXA(c, in, info, cba)
def≡
IOpi,θMXA (cmx) : mx(c.k) ∧ ¬startpiisa(c)
IOMXrMIPS
(
d̂, pi, info, θ, cba
)
: startpiisa(c)
IOMIPS(d, in)∧ : isa(c.k)(¬mode(d.cpu.core) =⇒ IOMXrMIPS (d, pi, info, θ, cba))
The last case means either any step in user mode, or a processor step in system mode without
interrupts such that the core executes cas, locksw, or a a volatile access by lw in the compiled
code.
The MXA OT -points are simply
OT pi,θMXA(c, in)
def≡

OT pi,θMXA (cmx) : mx(c.k) ∧ ¬startpiisa(c)
OT MXrMIPS
(
d̂
)
: startpiisa(c)
OT MIPS(d, in) : isa(c.k)
Definition 7.42 (C-IL IO-Points are Consistency Points in MXA). Additionally, we apply Def-
inition 6.51 for the MXA machine.
IOcppi,θMXA(c, info)
def≡ mx(c.k) =⇒ IOcppi,θMX
(
conf MXAMX (c), info
)
197
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
Definition 7.43 (IO- and OT -Points for MIPS-86 ISA Executing MXA). For the SB reduced
single core MIPS-86 in a configuration d ∈ CMIPS the corresponding predicates are defined as
IOMXArMIPS(d, in, pi, info, θ, cba)
def≡ IOMIPS(d, in)∧(¬mode(d.cpu.core) =⇒ IOMXrMIPS (d, pi, info, θ, cba))
OT MXArMIPS(d, in)
def≡ OT MIPS(d, in)
7.3.5 Requirements and Conditions for MIPS-86 Machine
Now, similarly to the sequential MX compiler correctness, we define requirements on steps of
the SB reduced MIPS-86 machines which will enable the simulation for the extended MX model.
Definition 7.44 (Suitability of Inputs for Reduced MIPS-86 Executing MXA). We consider an
input in ∈ ΣMIPS to be suitable for the MXA simulation when the reset signal is low. Using the
previous definition for the MIPS-86 encoding the MX machine, we define
suitMXArMIPS(in)
def≡ suitMXrMIPS(in)
Recall that according to Definition 6.54 the well-formedness of the MIPS-86 configuration for
the MX simulation required that the processor core must be in system mode, the store buffer is
empty, and the maskable interrupts are masked. Moreover, the software conditions guaranteed
that MASM steps could not change the mode and status registers.
In the extended mixed machine semantics applicable for system programming, one might
need to unmask the interrupts already during the hypervisor / OS kernel execution, therefore
we will require the same well-formedness only when the processor executes the compiled MXA
programm except its online assembly portions. Moreover, since the inline assembly belongs to
the system code, the processor must be in system mode and have the empty store buffer what
is guaranteed by the SB reduction.
Definition 7.45 (Configuration Well-Formedness for Reduced MIPS-86 Executing MXA). Then,
for d ∈ CMIPS, given pi, info, cba, and the shorthand core ≡ d.cpu.core, we define
wfconf MXArMIPS(d, pi, info, cba)
def≡
(i) inside(core, info, cba) ∧ ¬inline(core, piµ, infoµ, cbaµ) =⇒ wfconf MXrMIPS(d)
(ii) inline(core, piµ, infoµ, cbaµ) =⇒ ¬mode(core) ∧ d.cpu.sb = ε
We have already conditions that no reset occurs and the device and overflow interrupts are
masked. As we already know, in order to simulate the MX machine, the compiler has to guar-
antee that no illegal instructions appear in the code and memory accesses are properly aligned.
We require this in the well-behaviour of the MIPS-86 machine executed MXA program. More-
over, again one has to show that the software conditions needed for the store buffer reduction
are transferred.
Definition 7.46 (Well-Behaviour of Reduced MIPS-86 Executing MXA). For the same argu-
ments as in the previous definition, an input in ∈ ΣMIPS, and the shorthands core ≡ d.cpu.core,
I ≡ d.m4(core.pc) we define
wbMXArMIPS(d, in, pi, info, cba)
def≡
(i) scrMIPS(d, in)
(ii) inside(core, info, cba) ∧ ¬inline(core, piµ, infoµ, cbaµ) =⇒ ¬jisr(core, I, eev, 0, 0)
where eev comes from the only possible for this case input in = (core, wI , wD, eev).
198
7.3 Compiler Correctness for the Sequential Extended Mixed Machine
7.3.6 MXA Sofware Conditions
The static software conditions for the extended mixed machine differ from those stated for the
MX machine in Definition 6.56 only for the code and stack regions. Namely, since the hypervisor
/ OS kernel can use compiled libraries, the compiled program pi is included to the code region
among with other binaries. Moreover, the stack region can be changed in the MXA semantics.
Definition 7.47 (Static Software Conditions for MXA). Therefore, for given pi, info, θ, cba, we
adapt the needed static conditions from Definition 6.56 as follows:
scstatMXA(pi, info, θ, cba)
def≡
(i) AcodeMX (info, cba) ⊆ Acode
(ii) validcodeMX (info, cba)
(iii) AgvarCIL (picil, θ) \AconstCIL (picil, θ) ⊂ Adata
(iv) AconstCIL (picil, θ) = Aconst
(v) scprogMX (pi, θ)
Definition 7.48 (Dynamic Software Conditions for MXA). Given an MXA machine config-
uration c ∈ CMXA and an input in ∈ ΣMXA such that the MXA step c′ ≡ δpi,θ,ιMXA(c, in) with
ι ≡ (cba,StIba) is defined, we state the dynamic software conditions for this step and require
that: (i) for any MX machine step the dynamic MX software conditions hold, (ii) for any MIPS-86
step the software conditions needed for the store buffer reduction hold, (iii) before a successful
reconstruction of an MX thread configuration, the masked interrupts must be masked by the
programmer again, (iv) all stacks found in the memory do not overlap, and (v) – (vi) the current
stack as well as all stack information regions belong to the data region of the memory.
Let cmx, d, d̂, în be the shorthands from Definition 7.39, then using d, in, d
′
, sbas , msss consid-
ered below, we define
scdynMXA(c, in, pi, info, θ, cba,StIba)
def≡
(i) mx(c.k) ∧ ¬startpiisa(c) =⇒
scdynMX(cmx, in, pi, info, θ, cba, c.k.sba, c.k.mss)
(ii) mx(c.k) ∧ startpiisa(c) ∨ isa(c.k) =⇒ scrMIPS(d, in)
(iii) isa(c.k) ∧mx(c′.k) =⇒ ∀i ∈ {1, 7}. d′.cpu.core.spr(sr)[i] = 0
(iv) isa(c.k) ∧mx(c′.k) =⇒ validstacksMX (|sbas|, sbas,msss)
(v) mx(c.k) =⇒ AstackMX (c.k.sba, c.k.mss) ⊂ Adata
(vi) AStI (StIba, c.M) ⊂ Adata
where the corresponding MIPS-86 configuration d and the input in are
(d, in) ≡
{
(d̂, în) : mx(c.k) ∧ startpiisa(c)
(d, in) : isa(c.k)
and, hence, the next MIPS-86 configuration is computed as d
′ ≡ δrMIPS(d, in). Moreover,
the information about all available stacks used during the reconstruction is (sbas,msss) ≡
StI (StIba, d
′
.m).
199
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
Definition 7.49 (MXA Software Conditions). Combining both definition from above, we get
scMXA(c, in, pi, info, θ, cba,StIba)
def≡
(i) scstatMXA(pi, info, θ, cba)
(ii) scdynMXA(c, in, pi, info, θ, cba,StIba)
7.3.7 Sequential MXA Compiler Correctness in Concurrent Context
Similarly how it was done in Section 6.1.2.8 we introduce a few auxiliary definitions and state
the sequential MXA compiler correctness needed for the justification of the concurrent MXA
model. The definitions here are simply obtained from Section 6.1.2.8 by substituting the predi-
cates for the MX machine by the corresponding ones defined in this chapter.
For existing MXA steps from a given configuration c0 ∈ CMXA till the next consistency point,
we require that the MXA software conditions hold, the possible inconsistent region icm of mem-
ory is not accessed, and the configuration of the MXA machine at the consistency point is well-
formed. Formally, using ι ≡ (cba,StIba), we define
SCseqMXA(c0, pi, info, θ, cba,StIba, icm)
def≡
∀n ∈ N, c ∈ (CMXA⊥)n+1 , λ ∈ (ΣMXA)n .
(i) c1 = c0 ∧
(
c1 −→nδpi,θ,ιMXA,λ cn+1
)
(ii) ∀i ∈ [2 : n]. ci 6= ⊥ =⇒ ¬cpMXA(ci, pi, info, θ, cba)
(iii) cn+1 6= ⊥ =⇒ cpMXA(cn+1, pi, info, θ, cba)
=⇒
(i) ∀i ∈ Nn. scMXA(ci, λi, pi, info, θ, cba,StIba)∧
noaccpi,θMXA(ci, λi, icm, cba,StIba)
(ii) cn+1 6= ⊥ =⇒ wfconf pi,θ,cbaMXA (cn+1)
In order to make the restrictions on the number of IO-points and require the proper im-
plementation of MXA steps suitable for IO-operations and ownership transfer, we define a
predicate for non-empty sequences d ∈ (CMIPS)∗, σ ∈ (ΣMIPS)∗, such that |d| = |σ| + 1, and
c ∈ (CMXA)∗, τ ∈ (ΣMXA)∗ with |c| = |τ |+ 1:
oneIOMXArMIPS(d, σ, c, τ, pi, info, θ, cba)
def≡
(i) ∀i, j ∈ N|τ |. IOpi,θMXA(ci, τi, info, cba) ∧ IOpi,θMXA(cj , τj , info, cba) =⇒ i = j
(ii) ∀i, j ∈ N|σ|. IOMXArMIPS(di, σi, pi, info, θ, cba) ∧ IOMXArMIPS(dj , σj , pi, info, θ, cba) =⇒ i = j
(iii)
(
∃i ∈ N|τ |. IOpi,θMXA(ci, τi, info, cba)
)
=⇒ (∃i ∈ N|σ|. IOMXArMIPS(di, σi, pi, info, θ, cba))
(iv)
(
∃i ∈ N|τ |. OT pi,θMXA(ci, τi)
)
⇐⇒ (∃i ∈ N|σ|. OT MXArMIPS(di, σi))
Additionally, we use the following shorthands to indicate that there are no consistency points
in the given sequences:
nocpMXArMIPS(d, pi, info, θ, cba)
def≡ ∀i ∈ N|d|. ¬cpMXArMIPS(di, pi, info, θ, cba)
nocpMXA(c, pi, info, θ, cba)
def≡ ∀i ∈ N|c|. ¬cpMXA(ci, pi, info, θ, cba)
200
7.3 Compiler Correctness for the Sequential Extended Mixed Machine
Finally, the sequential compiler correctness for the extended mixed machine can be formu-
lated in the way needed for its application in the concurrent context. Its statement is very similar
to the correctness for the mixed machine given in Theorem 6.1 in detail. The only major differ-
ence is that instead of the base address of the stack and its maximal size we consider the base
address of the stack information. Moreover, the possible inconsistent memory icm for the MXA
machine does not include the code region.
Theorem 7.1 (Sequential MXA Compiler Correctness in Concurrent Context).
∀pi ∈ ProgMX, cba ∈ B32 × B32,StIba ∈ B32, θ ∈ ParamsCIL,
c0 ∈ CMXA, info ∈ infoT MX, icm ∈ 2B
32
, k ∈ N, d ∈ (CMIPS)k+1 , ω ∈ (ΣMIPS)k .
(i) wfconf pi,θ,cbaMXA (c0) ∧ wfconf MXArMIPS(d1, pi, info, cba)
(ii) cpMXA(c0, pi, info, θ, cba) ∧ cpMXArMIPS(d1, pi, info, θ, cba)
(iii) icm ⊂ Ahyp \AcodeMX (info, cba) ∧ consisMXA(c0, d1, pi, θ, info, cba, icm)
(iv)
(
d1 −→kδrMIPS,ω dk+1
)
∧ ∀i ∈ Nk. suitMXArMIPS (ωi)
(v) nocpMXArMIPS(d[2 : k], pi, info, θ, cba)
(vi) SCseqMXA(c0, pi, info, θ, cba,StIba, icm)
=⇒
∃n ∈ N, d′ ∈ (CMIPS)n+1 , σ ∈ (ΣMIPS)n ,m ∈ N, c ∈ (CMXA)m+1 , τ ∈ (ΣMXA)m .
(i) n ≥ k ∧ d′[1 : k + 1] = d ∧ σ[1 : k] = ω
(ii)
(
d′1 −→nδrMIPS,σ d′n+1
)
∧ ∀i ∈ Nn. suitMXArMIPS (σi)
(iii) cpMXArMIPS(d
′
n+1, pi, info, θ, cba) ∧ nocpMXArMIPS(d′[2 : n], pi, info, θ, cba)
(iv) wfconf MXArMIPS(d′n+1, , pi, info, cba) ∧ ∀i ∈ Nn.wbMXArMIPS(d′i, σi, pi, info, cba)
(v) c1 = c0 ∧
(
c1 −→mδpi,θ,ιMXA,τ cm+1
)
∧ wfconf pi,θ,cbaMXA (cm+1)
(vi) cpMXA(cm+1, pi, info, θ, cba) ∧ nocpMXA(c[2 : m], pi, info, θ, cba)
(vii) validcpMX(pi, info, θ) ∧ IOcppi,θMXA(cm+1, info)
(viii) oneIOMXArMIPS(d′, σ, c, τ, pi, info, θ, cba)
(ix) consisMXA(cm+1, d′n+1, pi, θ, info, cba, icm)
with ι ≡ (cba,StIba).
We leave the proof of this theorem out of the scope of the thesis because we do not consider
the compiler implementation in this thesis. In order to prove the claim, along with the appli-
cation of Theorem 6.1 one has to argue about additional cases. For instance, if after an inline
assembly step the control returns to the compiled code of the program pi not at a C-IL con-
sistency point, the compiler has also to guarantee that the next consistency point is reachable.
This claim, however, is not covered by Theorem 6.1 because we consider there only execution
starting from MX consistency points.
201
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
7.4 Justification of the Concurrent Extended Mixed Model
7.4.1 Cosmos Model Instantiations
Concurrent Extended Mixed Machine
Given a program pi = (piµ, picil) ∈ ProgMX with inline assembly, the environment parameters
θ ∈ ParamsCIL, the system information cι ≡ (cba,StIbas) with cba ∈ B32×B32, StIbas ∈
(
B32
)np
wrt. the number np ∈ N of processors in the multi-core MIPS-86 machine, and the compiler
information info ≡ (cmplMASM(piµ), cmplCIL(picil)), we define the instantiation Spi,θ,cιMXA ∈ S of the
Cosmos model for the extended mixed machine.
Let cmxa(u,m) ≡ (u.k, dmeB32) be an MXA configuration composed from the MXA thread
configuration u.k, and the partial memory m : B32 ⇀ B8 extended with dummy values. Then,
the signature of the Cosmos model Spi,θ,cιMXA is defined as:
• Spi,θ,cιMXA .A = B32
In contrast to Spi,θ,ξMX from Section 6.2.1, the code and stack regions are visible in the con-
current MXA model.
• Spi,θ,cιMXA .V = B8
• Spi,θ,cιMXA .R = AconstCIL (picil, θ) ∪Acode
with AcodeMX (info, cba) ⊆ Acode following from the MXA software conditions.
• Spi,θ,cιMXA .nu = np
• Spi,θ,cιMXA .U =
(
k ∈ KMXA, StIba ∈ B32
) ∪ {⊥}
The second components is introduce in the unit configuration in order to match the for-
malism of Cosmos model because we never use an index of the unit in the Cosmos model
instantiation. Therefore, for any unit configuration u ∈ Spi,θ,cιMXA .U we will refer the MXA
thread configuration by u.k, and the stack information base address of this unit by u.StIba .
Obviously, we will require for E ∈ CSpi,θ,cιMXA
StIbainvMXA(E)
def≡ ∀p ∈ Nnu. E.up.StIba = StIbasp
• Spi,θ,cιMXA .E = ΣMXA
• Spi,θ,cιMXA .reads(u,m, in) =
{
readspi,θMXA ((u.k,m), in, cba, u.StIba) : u 6= ⊥
∅ : otherwise
• Spi,θ,cιMXA .δ(u,m, in) =

(
u′, c′mxa.M|writespi,θMXA(cmxa(u,m),in)
)
: u 6= ⊥ ∧ c′mxa 6= ⊥
(⊥,m∅) : u 6= ⊥ ∧ c′mxa = ⊥
undefined : otherwise
where c′mxa ≡ δpi,θ,ιMXA (cmxa(u,m), in) with ι ≡ (cba, u.StIba) is the next sequential MXA ma-
chine configuration, u′ ≡ (c′mxa.k, u.StIba) is the next unit configuration, and m∅ denotes
the empty function with dom (m∅) = ∅.
• Spi,θ,cιMXA .IP(u,m, in) = (u 6= ⊥ =⇒ cpMXA(u.k, pi, info, θ, cba))
202
7.4 Justification of the Concurrent Extended Mixed Model
• Spi,θ,cιMXA .IO(u,m, in) =
(
u 6= ⊥ =⇒ IOpi,θMXA (cmxa(u,m), in, info, cba)
)
• Spi,θ,cιMXA .OT (u,m, in) =
(
u 6= ⊥ =⇒ OT pi,θMXA (cmxa(u,m), in)
)
SB Reduced Multi-Core MIPS-86 Implementing Concurrent MXA Machine
The instantiation Spi,θ,cιrMIPS ∈ S of the Cosmos model with the SB reduced MIPS-86 concurrently
running the compiled code of the MXA program pi is very similar to Spi,θ,ξrMIPS from the previous
chapter and defined as:
• For components X ∈ {A,V,R,nu,U , E , reads, δ} the instantiation is equal to the one for
the reduced machine
Spi,θ,ξrMIPS.X = SrMIPS.X
with AcodeMX (info, cba) ⊆ Acode and Aconst = AconstCIL (picil, θ).
• Spi,θ,cιrMIPS.IP(u,m, in) = cpMXArMIPS(u.core, pi, info, θ, cba)
• Spi,θ,cιrMIPS.IO(u,m, in) = IOMXArMIPS ((u, dmeB32), in, pi, info, θ, cba)
• Spi,θ,cιrMIPS.OT (u,m, in) = OT MXArMIPS ((u, dmeB32 , in))
7.4.2 Sequential Simulation Theorem
As the next step, we instantiate the sequential simulation framework RSMXASrMIPS(pi, θ, cι) ∈ R for
the two Cosmos models Spi,θ,cιMXA , S
pi,θ,cι
rMIPS ∈ S defined wrt. the given pi, θ, and cι ≡ (cba,StIbas).
For d ∈ Cproc×Cm, d = (cpu,m), c ∈ Spi,θ,cιMXA .U×
(
B32 → B8), c = (u,M), and info ∈ infoT MX,
icm ∈ 2B32 we define
RSMXASrMIPS(pi, θ, cι).

P = infoT MX
sim(d, c, info, icm) = icm ⊂ Ahyp \AcodeMX (info, cba)∧
validcpMX(pi, info, θ) ∧ IOcppi,θMXA(c, info)∧(
u 6= ⊥ =⇒
consisMXA ((u.k,M), d, pi, θ, info, cba, icm)
)
CPa(u, info) = (u 6= ⊥ =⇒ cpMXA(u.k, pi, info, θ, cba, info))
CPc(cpu, info) = cpMXArMIPS(cpu.core, pi, info, θ, cba)
wfa(c) = u 6= ⊥ ∧ wfconf pi,θ,cbaMXA ((u.k,M))
wfc(d) = wfconf MXArMIPS(d, pi, info, cba)
suit(ind) = suit
MXA
rMIPS(ind)
sc(c, inc, info) = (u 6= ⊥ =⇒
scMXA((u.k,M), in, pi, info, θ, cba, u.StIba))
wb(d, ind, info) = wb
MXA
rMIPS(d, ind, pi, info, cba)
Obviously, having proven Theorem 7.1, one easily gets that the generalized sequential sim-
ulation Theorem 2.3 also holds for Spi,θ,cιMXA , S
pi,θ,cι
rMIPS ∈ S and the framework RSMXASrMIPS(pi, θ, cι) ∈ R
instantiated wrt. the given program pi and the parameters θ, cι. Hence, the sequential MXA
compiler correctness in the concurrent setting in terms of Cosmos machines is stated similarly to
Theorem 6.2.
203
7 Semantics and Correctness of Concurrent Extended Mixed Machine for MIPS-86
Theorem 7.2 (Sequential MXA Compiler Correctness for Cosmos Model Simulation). The
generalized sequential simulation Theorem 2.3 holds for any Cosmos models Spi,θ,cιMXA , S
pi,θ,cι
rMIPS ∈ S and
the simulation framework RSMXASrMIPS(pi, θ, cι) ∈ R instantiated wrt. any given mixed machine program
pi = (piµ, picil) ∈ ProgMX with inline assembly, the environment parameters θ ∈ ParamsCIL, and the
system information cι ≡ (cba,StIbas) with cba ∈ B32 × B32, StIbas ∈ (B32)np, and the number
np ∈ N of processors in the multi-core MIPS-86 machine.
7.4.3 Concurrent Model Simulation
Finally, we consider the shared and unit invariants as well as other properties which enable the
simulation between MXA and SB reduced MIPS-86 Cosmos machines for the given program pi
and parameters θ, cι.
First, in order to define the property PSpi,θ,cιMXA instantiating PSa in Theorem 2.4, we introduce
auxiliary predicates.
Definition 7.50 (Guest Addresses are Shared in MXA Cosmos machine). Obviously, as before,
for any configuration E ∈ CSpi,θ,cιMX of the MXA Cosmos machine we require that Aguest is always
treated as shared and no addresses from Aguest can be owned by any execution unit.
oginvMXA(E)
def≡ Aguest ⊂ E.S ∧ ∀p ∈ Nnu. E.Op ∩Aguest = ∅
Definition 7.51 (Current Stacks and Stack Information are Local in MXA Cosmos machine).
Moreover, for such configurationsE we require that the memory regions occupied by the stacks
having the MX abstraction are locally owned. The same holds also for all stack information
regions.
ostinvMXA(E)
def≡
(i) ∀p ∈ Nnu. mx(E.up.k) =⇒ AstackMX (sbap,mssp) ⊂ E.Op \ E.S
(ii) ∀p ∈ Nnu. AStI (E.up.StIba, E.m) ⊂ E.Op \ E.S
where sbap ≡ E.up.k.sba and mssp ≡ E.up.k.mss denote the stack base address and maximal
stack size for the unit p.
Therefore, the property PSpi,θ,cιMXA on configurations of the MXA Cosmos machine is formalized
as
PSpi,θ,cιMXA
(E)
def≡ oginvMXA(E) ∧ ostinvMXA(E) ∧ StIbainvMXA(E)
Since the sets of memory addresses in both machines are equal, the shared invariant is trivial.
Definition 7.52 (Shared Invariant for Concurrent MXA Machine Simulation). For the Cosmos
models instantiated with the extended mixed machine and the SB reduced MIPS-86 we demand
the equality of their sets of shared S, Smxa, read-only R, Rmxa, and owned O(p), Omxa(p) by
each unit p addresses, as well as the contents of shared and read-only memories m, mmxa.
sinvMXASrMIPS(pi, θ, cι)
(
(m,S,R,O), (mmxa,Smxa,Rmxa,Omxa)
) def≡
(i) S = Smxa
(ii) R = Rmxa
(iii) ∀p ∈ Nnu . O(p) = Omxa(p)
(iv) m = mmxa
204
7.4 Justification of the Concurrent Extended Mixed Model
Like in the concurrent MX machine simulation, we do not need to introduce any restrictions
on the unit’s configuration of the SB reduced machine and set for cpu ∈ Cproc
uinvSMXASrMIPS(pi, θ, cι)
(
cpu,O,S) def≡ 1
Now, given that Assumptions 2.1– 2.4 are discharged for the instantiated machines wrt. any
pi, θ, cι, one easily guarantees that the Cosmos model simulation theorem holds for each case.
Theorem 7.3 (Cosmos Model Simulation Theorem for all MX Programs with Inline Assem-
bly and System/Environment Parameters). Theorem 2.4 holds for any programs pi = (piµ, picil) ∈
ProgMX with inline assembly, the environment parameters θ ∈ ParamsCIL, and the system information
cι ≡ (cba,StIbas) used for instantiation of the models Spi,θ,cιMXA , Spi,θ,cιrMIPS ∈ S.
7.4.4 Application of the Concurrent MXA Machine Simulation
The application of the concurrent MXA machine simulation for the order reduction is pretty
similar to the one covered in Section 6.2.4.
Particularly, one can easily prove the claim of Lemma 6.1 for wbMXArMIPS(d, in, pi, info, cba) by the
same argumentation as before.
Moreover, for the store buffer reduction we need the property PSrMIPS(E) for E ∈ CSrMIPS .
In fact, we have oginvMXA(C) for C ∈ CSpi,θ,cιMX and the shared invariant sinv(D,C, par) requires
that the sets of addresses and memory regions involved into the coupling are the same for C
and D ∈ CS′pi,θ,cιrMIPS of the extended SB reduced MIPS-86 Cosmos machine. Therefore, the claim
of Lemma 6.2 for this case directly follows from the shared invariant and does not need other
argumentation.
Additionally, one can also consider the transfer of any other divisible property P ′
Spi,θ,cιMXA
of the
MXA Cosmos machine (such that the simulation hypothesis hold) into the incompletely simu-
lated property Q[P ′
Spi,θ,cιMXA
, par ] of S′pi,θ,cιrMIPS.
Having transferred the well-behaviour and safety properties from suitable block schedule
computations to any arbitrary interleaved computations of the SB reduced multi-core MIPS-
86, one can apply the order reduction and the store buffer reduction in the way sketched in
Section 6.2.4. We leave these technical details here as a simple bookkeeping exercise.
205

8
Concurrent Kernel Threads:
Model, Implementation, and
Correctness Criteria
In the previous chapter we considered and justified the semantics of the concurrent machine
with MXA threads being executed on the multi-core MIPS-86. This semantics can be used for
the implementation of hypervisor and operating system kernels where multi-threading can be
restricted by the number of processor cores. However, as we already know, it is typical to have
more threads operating in the address space of a given system or user process.
In this chapter we proceed with the extension of the concurrent MXA semantics with coop-
erative1 POSIX-like threads [POS95, But97], provide their abstract model, implementation, and
consider the correctness criteria in detail. We chose the minimal number of operations on kernel
threads which should be sufficient for multi-theraded system programming as well as imple-
mentation of other functions of POSIX API. Particularly, we pay attention on the semantics and
correctness of the thread switch based of stack substitution, and special functions working on
shared resources requiring lock protection.
To the best of our knowledge, though the stack based switch is a classical method used by
system programmers (e.g., see Chapter 20 in [Han96]), the detailed semantics of such a thread
switch and correctness of its implementation for the industrial-like higher-level programming
languages on multi-core machines is not considered in any known scientific works (e.g., [FS05,
FSV+06, FSDG08, NYS07, GFSS12]). Moreover, in order to specify operations using the lock
protection, we take into account our model for concurrent simulation and split the operations
into phases similar to those considered in [CL98].
The semantics of kernel threads given here is more detailed than just the concurrent MXA
machine containing a number of threads greater than the number of processor cores and addi-
tionally specifying operations on them. Some details (e.g., information about the stack in the
call of the thread creation function, thread identifiers explicitly provided for the thread oper-
ations) are needed by the thread manager responsible for the scheduling policy, cleaning the
system from the finished thread, queuing requests between threads, etc., and can be further ab-
stracted away in the presence of its implementation and the memory manager performing the
memory allocation.
1In comparison to preemptive threads being switched by the scheduler that can take a decision about the time slot for
the thread execution and also rely on the timer interrupts, the cooperative threads decide by themselves when and to
which next thread they give the control.
207
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
8.1 Abstract Model of Kernel Threads
8.1.1 Program and Parameters
In the model we consider a program pi = (piµ, picil) ∈ ProgMX corresponding to the code of
the hypervisor / OS kernel implementation. Since the context switch and virtualization are
supposed to be performed on this level, the MASM program piµ is allowed to contain inline
assembly portions.
The C-IL program picil has the declaration of the function pointer type thfun t for thread entry
functions
typedef void (*thfun_t)(u32);
and the external functions
extern void thread_create(u32 tid, u32 pid, thfun_t fn, u32 arg, void *sba, u32 mss);
extern void thread_acquire();
extern void thread_run(u32 tid);
extern void thread_exit_to(u32 tid);
extern void thread_delete(u32 tid);
implementing the operations on the kernel threads, namely
• thread create – creation of a new thread for a given processor,
• thread acquire – acquiring all newly created threads by the running processor,
• thread run – switch a thread with a given identifier,
• thread exit to – finishing the running thread and a switch to a given thread,
• thread delete – deletion of a given finished thread.
We call them as special functions or primitives and define the set Fprimname ⊂ Fname of corresponding
function names as
Fprimname
def≡ {thread create, thread acquire, thread run, thread exit to, thread delete}
A typical lifecycle of a thread is depicted on Figure 8.1 where schedule is an operation per-
formed by either thread run or thread exit to which we call scheduling primitives. A new thread
with a free (i.e., not used by other threads) identifier is created by the call of thread create tak-
ing additionally as parameters an ID of a processor on which the thread will run, the pointer to
the thread entry function, its argument, and the information about the allocated stack. A new
thread after its creation resides in a global pool of new threads visible for all processors and can
only be locally scheduled after its acquisition by a corresponding processor. When the prim-
itive thread acquire is executed on a processor, all existing new threads with a corresponding
processor ID are acquired and become ready for scheduling. A new thread starts to run on a
processor or a sleeping thread is restored when one of the scheduling primitives with its ID is
called. Moreover, the running thread calling such a primitive either switches to the sleeping
state in case of thread run or becomes finished after the execution of thread exit to. A finished
thread cannot be scheduled again and can only be deleted by thread delete executed in the code
of any running thread present in the model.
Along with the program, as before we keep a corresponding C-IL environment parameter
θ ∈ ParamsCIL.
208
8.1 Abstract Model of Kernel Threads
invalid (free)
creation
new finished
deletion
running
sleeping
schedule
schedule exit
acquisition
Figure 8.1: Thread lifecycle. The circles on figure denote thread state whereas the edges are
operations executed by the primitives. The underlined operations are performed on data that
can be shared between processors and, therefore, are considered to be global in comparison to
the schedule and exit operation being local.
We consider the model with np ∈ N running threads2 corresponding to the number of pro-
cessors with identifiers from the set Nnp. The constant mtid ∈ N represents the maximal thread
identifier. Each thread ID belongs to Nmtid. Moreover, we require mtid ≥ np.
Since in the model of kernel threads we will also consider the execution of inline assembly
as well as guest / process steps, we will basically rely on a simplified version of the extended
mixed machine (from Chapter 7) where only the stack of the running thread may be recon-
structed after the pure MIPS-86 ISA steps. Therefore, we introduce here again the code base
addresses cba = (cbacil, cbaµ) ∈ B32 × B32 such that validcodeMX (info, cba) holds, and the com-
piler information info = (infocil, infoµ) ∈ infoT MX computed as infoµ = cmplMASM(piµ) and
infocil = cmplCIL(picil).
8.1.2 Sequential Semantics in Concurrent Setting
8.1.2.1 Machine Configuration
Though in the model for clarity we will explicitly distinguish between running, sleeping, fin-
ished, and newly created threads, we introduce only two kinds of thread configurations, namely,
running and non-running. Intuitively, new, sleeping, and finished threads have a configuration
of the non-running thread, which, however, must be restricted by the well-formedness depend-
ing on the state.
In Definition 7.1 we have already introduced the configuration of the MXA thread repre-
sented by the MX thread or the processor configuration in case of inline assembly and guest /
process steps. The configuration of the running thread is obtained by extending this definition
2In this thesis we do not consider the system startup during which at least one thread is created for each processor of
the multi-core machine.
209
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
so that the processor configuration is accompanied by the base address and maximal stack size
too. These components are needed in order to allow the context switch in the model.
Definition 8.1 (Configuration of the Running Thread). The configuration of the running thread
KrunTh
def≡ KmxTh ∪KisaTh
is either the MXA configuration in case of the mixed machine steps
KmxTh
def≡ (kmx ∈ KMX, sba ∈ B32, mss ∈ N, tlb ∈ Ctlb)
or the extended processor configuration during inline assembly, guest, or user steps
KisaTh
def≡ (cpu ∈ Cproc, sba ∈ B32, mss ∈ N)
Since we do not support thread migration in this work, the processor affinity of any running
thread can be directly taken from the corresponding special purpose register.
Definition 8.2 (Processor Affinity of the Running Thread). Given a configuration th ∈ KrunTh of
a running thread, we obtain its processor affinity as
pidrun(th)
def≡
{
〈th.kmx.ac.spr(pid)〉 : th ∈ KmxTh
〈th.cpu.core.spr(pid)〉 : otherwise
In comparison to the running threads, the configuration of a non-running thread is simpler.
Such a thread can be newly created or be the result of the switch performed by a call of the
corresponding primitive from the C-IL context. Therefore, the active context of the non-running
thread can be only of the C-IL type. Moreover, usually during the thread switch the special
purpose registers are not saved and restored, and we do not keep them in the configuration
either.
Definition 8.3 (Configuration of a Non-Running Thread). Therefore, we model any non-running
thread by the list of inactive execution contexts, the active C-IL context, the same components
sba, mss characterizing the allocation of the stack in the memory, and the processor affinity pid.
KnrunTh
def≡
(
ac ∈ contextCIL, ic ∈
(
contextCIL ∪ context inactiveMASM
)∗
,
sba ∈ B32, mss ∈ N, pid ∈ Nnp
)
In the manner as it was done before, we define a configuration corresponding to the execution
unit. Such a configuration contains all threads belonging to a single processor, except those
threads that are still not acquired.
Definition 8.4 (Configuration of Processor’s Threads). The configuration of threads belonging
to a given processor is represented by a tuple containing two mappings ready and fin from
threads identifiers to configurations of threads ready for scheduling and a scheduled one in the
first case, and finished in the second one. The identifier of the scheduled (or current) thread is
kept in the component ct.
KprocTh
def≡
(
ready : Nmtid ⇀ KrunTh ∪KnrunTh ,
fin : Nmtid ⇀ KnrunTh , ct ∈ Nmtid
)
210
8.1 Abstract Model of Kernel Threads
Finally, we define the configuration of the machine with kernel threads. Though we still
consider the sequential case, the definition is adapted for the use in the concurrent setting.
Definition 8.5 (Configuration of Sequential Machine for Kernel Threads). Full configurations
of the sequential machine with kernel threads are defined by the set CTh of tuples
CTh
def≡
(
k ∈ KprocTh , new : Nmtid ⇀ KnrunTh , free ∈ 2Nmtid , ap ∈ [0 : np], M : B32 → B8
)
containing the following components:
• k – the local configuration of the processor’s threads,
• new – the mapping from the identifier of newly created threads to their configurations,
• free – a set of free thread identifiers that can be used for the thread creation,
• ap – the identifier of a processor currently working on global new and free (or accessing
processor). If no processor accesses them, we have ap = 0. The component ap models a
lock used in the implementation of the kernel threads.
• M – the global byte-addressable memory.
8.1.2.2 Transition Function
In order to define the semantics we first introduce a few reloaded auxiliary functions needed
for using the semantics from lower levels of the model stack.
Definition 8.6 (Current Thread Configuration). For a given configuration k ∈ KprocTh of proces-
sor’s threads we define a function computing the configuration of the current thread as
thcur (k)
def≡ k.ready(k.ct)
Therefore, for a configuration c ∈ CTh of the sequential machine for kernel threads we simply
have
thcur (c)
def≡ thcur (c.k)
Using this definition we can introduce the shorthands for the base address and maximal size
of the stack for the current thread:
sbacur (c)
def≡ thcur (c).sba
msscur (c)
def≡ thcur (c).mss
Given a configuration c ∈ CTh and Definition 8.2 we can also denote the processor affinity of
the current thread by
pidcur (c)
def≡ pidrun(thcur (c))
Definition 8.7 (Current Thread to MXA Machine). We also easily transform the configuration
of the current thread in k ∈ KprocTh into the MXA thread configuration from Section 7.1.2 by the
function
conf curTKMXA(k)
def≡
{
th : th ∈ KmxTh
th.cpu : otherwise
211
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
with th ≡ thcur (k).
Hence, the configuration of the current thread in a configuration c ∈ CTh can be easily trans-
formed into an MXA machine configuration by
conf curTMXA(c)
def≡
(
conf curTKMXA(c.k), c.M
)
Moreover, we introduce functions transforming a non-running thread into a running one and
the other way around.
Definition 8.8 (Thread Transformation). Given a configuration th ∈ KnrunTh of a non-running
thread, special purpose registers spr, and tlb ∈ Ctlb from the configuration of the current
threads, we get a corresponding configuration of a running thread by the function
wakeup(th, spr, tlb)
def≡ (kmx, th.sba, th.mss, tlb)
with kmx = (th.ac, th.ic, spr).
For the reverse operation on a running thread with a configuration th ∈ KmxTh we define
sleep(th)
def≡ (th.kmx.ac, th.kmx.ic, th.sba, th.mss, pidrun(th))
As we mentioned before, we consider the model of threads after the system startup when
at least one thread is created for each processor. We will distinguish such threads by their
identifiers.
Definition 8.9 (Initial Thread). Under the initial thread we understand a thread created for a
given processor during initialization of the kernel or a process creation. Such a thread can be
always scheduled and never exits. For clarity in this work, we assume that the initial thread of a
given processor has the identifier equal to the ID of this processor.
Obviously, we are interested in the semantics of the machine defined on well-formed con-
figurations. First, we introduce an auxiliary definition for the configuration of the processor’s
threads.
Definition 8.10 (Well-Formed Processor’s Threads). A configuration k ∈ KprocTh of processor’s
threads is considered to be well-formed if (i) the current thread is present and has a configu-
ration of the running thread, (ii) all other ready for scheduling threads have configurations of
non-running ones, (iii) – (iv) all non-running threads have the processor affinity of the running
thread, (v) threads cannot be ready for scheduling and finished simultaneously, (vi) the contexts
of ready for scheduling threads are well-formed, (vii) the stacks of all threads do not overlap,
and (viii) the initial thread for the processor is always present in the model.
Formally, using the shorthands
thr(t) ≡ k.ready(t) Dr ≡ dom (k.ready)
thf (t) ≡ k.fin(t) Df ≡ dom (k.fin)
sbat ≡ thr(t).sba msst ≡ thr(t).mss
and for ths : Nmtid ⇀ KrunTh ∪KnrunTh
valid stackTh (ths)
def≡ ∀p, q ∈ dom (ths) . p 6= q =⇒
AstackMX (sbap,mssp) ∩AstackMX (sbaq,mssq) = ∅
212
8.1 Abstract Model of Kernel Threads
we define
wfthpi,θproc(k)
def≡
(i) k.ct ∈ Dr ∧ thr(k.ct) ∈ KrunTh
(ii) ∀t ∈ Dr \ {k.ct}. thr(t) ∈ KnrunTh
(iii) ∀t ∈ Dr \ {k.ct}. thr(t).pid = pidrun(thr(k.ct))
(iv) ∀t ∈ Df . thf (t).pid = pidrun(thr(k.ct))
(v) Dr ∩Df = ∅
(vi) ∀t ∈ Dr \ {k.ct}. wfcntxpi,θMX (thr(t).ac, thr(t).ic)
(vii) valid stackTh (k.ready unionmulti k.fin)
(viii) pidrun(thr(k.ct)) ∈ Dr
As a special case of non-running threads we consider newly created threads that can be ei-
ther in the list new of the sequential machine configuration or even in ready after acquiring
(shown in the transition function later). The stacks of such threads have one C-IL frame of the
corresponding entry function.
Definition 8.11 (Well-Formed Configuration of New Thread). Given a configuration th ∈
KnrunTh of a non-running thread, we say that it is a well-formed new thread if it has an empty
list of the inactive context and the active context is represented by a single well-formed C-IL
frame of a thread entry function f implemented in pi. The location counter of the frame points
to the beginning of the function’s body. All local variables (except the parameter) in M′E are
initialized with zeros.
wfthpi,θnew (th)
def≡
(i) |th.ac| = 1 ∧ th.ac[1] ∈ frameCIL ∧ th.ic = ε
(ii) wfstackpi,θCIL(th.ac)
(iii) th.ac[1] = (f,⊥, 1,M′E)
The formal definition of the initialization for these components f andM′E will be considered in
the transition function and we skip it here for brevity.
Definition 8.12 (Well-Formed Configuration of Sequential Machine for Kernel Threads). Fi-
nally, the well-formedness of machine configurations c ∈ CTh requires that (i) the configu-
ration of the processor’s threads is well-formed, (ii) the MXA machine configuration cmxa ≡
conf curTMXA(c) obtained for the current thread is well-formed, and (iii) if the thread can access
the shared resources new and free, then (a) all new threads have well-fomed configurations,
(b) all stacks in the machine do not overlap, and (c) the same thread cannot be in a few states
simultaneously.
Using the shorthands Dr , Df from Definition 8.10, we have
wfconf pi,θ,cbaTh (c)
def≡
(i) wfthpi,θproc(c.k)
(ii) wfconf pi,θ,cbaMXA (cmxa)
(iii) c.ap = 0 ∨ c.ap = pidcur (c) =⇒
(a) ∀t ∈ dom (c.new) . wfthpi,θnew (c.new(t))
(b) valid stackTh (c.k.ready unionmulti c.k.fin unionmulti c.new)
(c) ∀T, T ′ ∈ {Dr ,Df ,dom (c.new) , c.free}. T 6= T ′ =⇒ T ∩ T ′ = ∅
213
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Note that condition in (iii) is used in order to satisfy the requirements on the well-formedness
introduced in Assumption 2.2 from Chapter 2.
Definition 8.13 (Transition Function of Sequential Machine for Kernel Threads). For the
given program pi = (piµ, picil) ∈ ProgMX from Section 8.1.1, the environment parameters θ ∈
ParamsCIL, and the code base addresses cba = (cbacil, cbaµ) ∈ B32 × B32, the transitions of the
sequential machine for kernel threads are defined by the partial function
δpi,θ,cbaTh : CTh × ΣMXA ⇀ CTh⊥
where CTh⊥
def≡ CTh ∪{⊥} contains the error state, coming from the mixed machine semantics
as well as the kernel threads semantics being defined here. The input alphabet ΣMXA serves for
the same purposes as in Definition 7.26 because the current thread can execute inline assembly
and guest / process steps.
Given a configuration c ∈ CTh , an input in ∈ ΣMXA, and using the shorthands
cmxa ≡ conf curTMXA(c) cmx ≡ conf MXAMX (cmxa)
ccil ≡ confCIL(cmx) stmtcil ≡ stmtnext(ccil, picil)
we now define in detail the steps of the sequential machine for kernel threads. Among those
we distinguish the primitives’ execution and the cases similar to the extended mixed machine
MXA but simplified because we will restrict the stack substitution in our model.
MXA Machine Steps
• Mixed machine step:
mx(cmxa.k) ∧ in ∈ ΣMX ∧ ¬startpiisa(cmxa)∧(
cil(cmx.ac) ∧ stmtcil = call f(E) =⇒ f /∈ Fprimname
)
When the machine performs a pure MX step without a primitive call or switching to inline
assembly, the semantics is the same as for MXA in this case.
So, performing the MX step
c′mx ≡ δpi,θMX(cmx, in)
we get again the new configuration of the MX thread for c′mx 6= ⊥
k′mx ≡ (c′mx.ac, c′mx.ic, c′mx.spr)
and compose the new configuration c′ ∈ CTh such that only the configuration of the
current thread and the memory are updated as
c′.k.ready(c.k.ct).kmx = k′mx c
′.M = c′mx.M
All other components of c′ are the same as in c′ and we skip here the full formal definition
for brevity.
Hence, the result of the step of the machine for kernel threads is defined as
δpi,θ,cbaTh (c, in)
def≡
{
c′ : c′mx 6= ⊥
⊥ : otherwise
214
8.1 Abstract Model of Kernel Threads
• Switch to inline assembly:
startpiisa(cmxa) ∧ in ∈ ΣA ∧(
dom (in.mst) = A
pi,θ
stack(cmxa, info, cmxa.k.sba)
)
∧ suitpi,θ,cbastart (cmxa, in, info)
Similarly to the definition of the same step for the MXA machine we execute the first
instruction of the inline assembly:
d′ ≡ δrMIPS
(
d,
(
core, wI , wD, 0
256
))
with d ≡ conf startrMIPS(cmxa, in) and wI = wD =  Cwalk.
However, if endpi,θ,cbaisa (d
′.cpu) holds, in contrast to the MXA machine semantics where one
attempts to find a new pair of the stack base address and stack maximal size, here we try
to reconstruct the stack identified only by sbacur (c) andmsscur (c) belonging to the current
thread, what is indicated by
matchstcur (c, d
′)
def≡ spv′, bpv′ ∈ AstackMX (sbacur (c),msscur (c)) ∧
〈spv′〉 ≤ 〈bpv′〉
with spv′ ≡ d′.cpu.core.gpr(sp) and bpv′ ≡ d′.cpu.core.gpr(bp). If this condition does not
hold, we generate the run-time error ⊥. Otherwise, one tries to reconstruct the MX thread
configuration
k′mx ≡ Rpi,θKMX (d′, info, cba, sbacur (c))
and in case of k′mx 6= ⊥ computes the new configuration of the running thread
th ′ ≡ (k′mx, sbacur (c),msscur (c), d′.cpu.tlb)
Hence, the transition function for this step is defined as follows:
δpi,θ,cbaTh (c, in)
def≡

c′ : endpi,θ,cbaisa (d
′.cpu)∧
matchstcur (c, d
′) ∧ k′mx 6= ⊥
⊥ : endpi,θ,cbaisa (d′.cpu)∧
¬matchstcur (c, d′)
c′′ : otherwise
(8.1)
where depending on the case we get from c either the machine configuration c′ ∈ CTh
with the updated MX thread such that
c′.k.ready(c.k.ct) = th ′ c′.M = d′.m
or the configuration c′′ ∈ CTh where only the processor and memory components are
updated
c′′.k.ready(c.k.ct).cpu = d′.cpu c′′.M = d′.m
• Inline assembly, compiled code, or guest/process step:
isa(cmxa.k) ∧ in ∈ ΣMIPS ∧
(in = (core, wI , wD, eev) =⇒ ¬eev[0]) ∧
δrMIPS
(
conf MXArMIPS(cmxa), in
)
= d′
215
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
where d′ is some SB reduced MIPS-86 configuration such that the transition function
δrMIPS is defined.
Similarly to the MXA semantics, we use this d′ and compute k′mx and th ′ as in the previous
case. Then, δpi,θ,cbaTh (c, in) is again defined by the equation (8.1).
Primitives’ Execution
In order to indicate the execution of any of the aforementioned primitives, we define a predi-
cate requiring that (i) the active context of the current thread is C-IL and (ii) a primitive is called
by its name such that the types of all parameters match the function declaration:
primpi,θTh (c)
def≡
(i) mx(cmxa.k) ∧ cil(cmx.ac)
(ii) ∃f ∈ Fprimname . stmtcil = call f(E)∧
|E| = Fθpi(f).npar ∧ ∀i ∈ [1 : |E|]. Fθpi(f).V [i] = (v, t) =⇒ τpi,θccil (E[i]) = t
We also introduce the function inccurloc (c) incrementing the location counter of the current
thread in this case:
inccurloc (c)
def≡ c′
where c′ is obtained from c by only updating
c′.k.ready(c.k.ct).kmx.ac = incloc(ccil).stack
Then, if the predicate primpi,θTh (c) holds, depending of the primitive’s name we distinguish
between the following cases:
• Thread switch: stmtcil = call thread run(E)
From the list of arguments E with a single argument etid = E we compute by expression
evaluation the value [[etid ]]picil,θccil = val(a,u32) and the identifiers of threads to be switched:
tidto ≡ 〈a〉 tidfrom ≡ c.k.ct
The thread switch can be performed successfully if tidto is present among threads ready
for scheduling, what is indicated by the predicate
thswitchpi,θ(c)
def≡ tidto 6= tidfrom ∧ tidto ∈ dom (c.k.ready)
In this case, by incrementing the location counter of the current thread and setting up the
current thread component to tidto we get intermediate configurations
c′ ≡ inccurloc (c) c′′ ≡ c′
[
k := c′.k[ct := tidto ]
]
In turn, the configurations of the threads before and after the transformation needed for
the switch are
thfrom ≡ thcur (c′) thto ≡ thcur (c′′)
th ′from ≡ sleep (thfrom) th ′to ≡ wakeup (thto , spr′, tlb′)
with spr′ ≡ thfrom .kmx.spr and tlb′ ≡ thfrom .tlb.
216
8.1 Abstract Model of Kernel Threads
Therefore, the final configuration c′′′ of the machine with kernel threads in case of the
successful switch is obtained from c′′ by updating the thread configurations involved into
the switch as follows:
c′′′.k.ready(tidfrom) = th ′from c
′′′.k.ready(tidto) = th ′to
All other components of c′′′ are equal to the same ones from c′′.
Hence, the result of the thread switch is defined as
δpi,θ,cbaTh (c, in)
def≡

c′′′ : thswitchpi,θ(c)
c′ : tidto = tidfrom
⊥ : otherwise
• Thread exit: stmtcil = call thread exit to(E)
In order to define the semantics of the thread exit, we use tidto , tidfrom , c′, c′′, th ′from , and
th ′to computed in the previous case.
In comparison to the thread switch, the successful exit to another thread can only be per-
formed by a thread that is not initial for the processor (see Definitions 8.9 and 8.10). There-
fore, if the conditions
thexitpi,θ(c)
def≡ thswitchpi,θ(c) ∧ tidfrom 6= pidcur (c)
hold, we obtain the new machine configuration c′′′ by moving the exiting thread to the
finished ones and updating c′′ in the following way:
c′′′.k.ready(x) =

th ′to : x = tidto
c′′.k.ready(x) : x ∈ dom (c.k.ready) \ {tidto , tidfrom}
undefined : otherwise
c′′′.k.fin(x) =
{
th ′from : x = tidfrom
c′′.k.fin(x) : otherwise
Finally, the transition function for the thread exit step is defined as
δpi,θ,cbaTh (c, in)
def≡

c′′′ : thexitpi,θ(c)
c′ : tidto = tidfrom
⊥ : otherwise
Note that the introduction of this primitive in the semantics allows to return from the
function even in the bottom frame of the stack, what is not possible in MASM, C-IL, and
MX considered in the previous chapters of this thesis and [Sha12, Sch13].
• Thread deletion: stmtcil = call thread delete(E)
Performing the expression evaluation of the single argumentE = etid we get the identifier
of a thread to be deleted
[[etid ]]
picil,θ
ccil
= val(a,u32) tid ≡ 〈a〉
217
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
The result of the computation depends whether the thread can access the component
c.free. If it is not used by some other processor (or free) and the thread tid is finished
c.ap = 0 ∧ tid ∈ dom (c.k.fin)
one computes the resulting configuration c′ by updating the following components in c:
c′.free = c.free ∪ {tid} c′.ap = pidcur (c)
c′.k.fin(x) =
{
c.k.fin(x) : x 6= tid
undefined : otherwise
where c′.ap indicates that c′.free is now in use (successfully acquired and becomes locked)
by the processor pidcur (c) and cannot be accessed by others because the execution of the
primitive is not complete yet.
Hence, the semantics of the step is defined by case split on phases corresponding to
– c.ap 6= 0: an unsuccessful attempt to acquire c.free,
– c.ap = 0: its acquisition with the configuration update considered above, and
– c.ap = pidcur (c): finishing the execution of the primitive by releasing c.free and in-
creasing the location.
δpi,θ,cbaTh (c, in)
def≡

c′ : c.ap = 0 ∧ tid ∈ dom (c.k.fin)
⊥ : c.ap = 0 ∧ tid /∈ dom (c.k.fin)
inccurloc (c)[ap := 0] : c.ap = pidcur (c)
c : otherwise
Note that according to the last case in the definition of δpi,θ,cbaTh (c, in), we stay at the prim-
itive call as long as we cannot access c.free occupied by some other processor. At the
beginning of each phase we will obviously require that the machine configuration c is
well-formed wrt. Definition 8.10.
It is important to mention that in comparison to the Verisoft project [Ver10] and [PBLS15]
where the execution of a CVM primitive (see also related works in Chapter 1) on a single-
core machine is modelled as a single atomic step, this thesis is the first document giving
the specification of primitives using the lock protection in their implementation on the
multi-core processor architecture.
We proceed with the semantics of the tread creation and acquisition, which also perform
global operations (recall Figure 8.1) and, therefore, are specified in a similar way.
• Thread creation: stmtcil = call thread create(E)
Let the list E of expressions passed into the function as parameters be
E = etid ◦ epid ◦ efn ◦ earg ◦ esba ◦ emss
We evaluate the expressions with subscripts s ∈ {sba,mss, pid , tid} as
[[es]]
picil,θ
ccil
= val(as, ts)
Then, if the conditions
c.ap = 0 ∧ 〈atid〉 ∈ dom (c.free)
218
8.1 Abstract Model of Kernel Threads
hold, we compute a new machine configuration c′ ∈ CTh by updating c in the following
way:
c′.free = c.free \ {〈atid〉} c′.ap = pidcur (c)
c′.new(x) =
{
th : x = 〈atid〉
c.new(x) : otherwise
where th is a configuration of the new thread
th ≡ (frame, asba , 〈amss〉, 〈apid〉)
with the C-IL frame composed as
frame ≡ (f,⊥, 1,M′E)
such that the function name f satisfies
isfunc([[efn ]]
picil,θ
ccil
, f, θ) ∧ f ∈ dom (Fθpicil) ∧ ¬ext(f, picil, θ)
and the local memory is initialized wrt. the input argument
M′E(v) =

val2bytes
(
[[earg ]]
picil,θ
ccil
)
: Fθpicil(f).V [1] = (v, t)
(08)sizeθ(qt2t(t)) : Fθpicil(f).V [i] = (v, t) ∧ i > 1
undefined : otherwise
Hence, using the predicate vst(c′) ≡ valid stackTh (c′.k.ready unionmulti c′.k.fin unionmulti c′.new) indicating
that the stack of the new thread does not overlap with stacks of other threads, the step is
defined as
δpi,θ,cbaTh (c, in)
def≡

c′ : c.ap = 0 ∧ 〈atid〉 ∈ dom (c.free) ∧ vst(c′)
inccurloc (c)[ap := 0] : c.ap = pidcur (c)
⊥ : c.ap = 0 ∧ (〈atid〉 /∈ dom (c.free) ∨ ¬vst(c′))
c : otherwise
where the processor pidcur (c) locks/releases the components free and new .
• Thread acquisition: stmtcil = call thread acquire()
By the thread acquisition we move all new threads with affinity equal to pidcur (c) to
the configuration of the processor’s threads. So, if the new threads can be accessed, i.e.
c.k.ac = 0 holds, we update the configuration c in a way such that c′ differs from c in the
following:
c′.k.ready(x) =
{
c.new(x) : c.new(x).pid = pidcur (c)
c.k.ready(x) : otherwise
c′.new(x) =
{
c.new(x) : c.new(x).pid 6= pidcur (c)
undefined : otherwise
c′.ap = pidcur (c)
Hence, the step of the machine is defined as
δpi,θ,cbaTh (c, in)
def≡

c′ : c.ap = 0
inccurloc (c)[ap := 0] : c.ap = pidcur (c)
c : otherwise
219
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
In other cases or when the expression evaluation cannot be successfully performed, the result
of the step δpi,θ,cbaTh (c, in) is undefined.
8.1.3 Concurrent Semantics of Kernel Threads
Now, the concurrent model of kernel threads can be easily defined on the base of the definitions
from the previous section.
Definition 8.14 (Configuration of Concurrent Machine for Kernel Threads). Configurations
of the concurrent machine for kernel threads with np running threads (equal to the number
of processors) are represented by tuples that differ from CTh only by the mapping k from the
processor identifier to the configuration of the processor’s threads.
CcTh
def≡
(
k : Nnp → KprocTh , new : Nmtid ⇀ KnrunTh , free ∈ 2Nmtid , ap ∈ [0 : np], M : B32 → B8
)
Definition 8.15 (Sequential Machine Configuration from the Concurrent Machine for Kernel
Threads). For a configuration c ∈ CcTh of the concurrent machine for kernel threads and t ∈
Nnp we define confTh(c, t) ∈ CTh such that
confTh(c, t)
def≡ (c.k(t), c.new , c.free, c.ap, c.M)
Definition 8.16 (Transition Function of the Concurrent Machine for Kernel Threads). Then,
for pi = (piµ, picil) ∈ ProgMX, θ ∈ ParamsCIL, and cba = (cbacil, cbaµ) ∈ B32 × B32, the transitions
of the concurrent machine for kernel threads are defined by the function
δpi,θ,cbacTh : CcTh × Nnp × ΣMXA ⇀ CcTh⊥
such that for a configuration c ∈ CcTh , a processor with index t ∈ Nnp running a kernel thread,
and an input in ∈ ΣMXA, the result of the transition is computed as
δpi,θ,cbacTh (c, t, in)
def≡
{(
c.k[t 7→ c′t.k], c′t.new , c′t.free, c′t.ap, c′t.M
)
: c′t 6= ⊥
⊥ : otherwise
where c′t is the next configuration of the sequential machine for threads belonging to the pro-
cessor t
c′t ≡ δpi,θ,cbaTh (confTh(c, t), in)
If c′t does not exist, the result of the step δ
pi,θ,cba
cTh (c, t, in) is undefined.
Naturally, the transition function for t assumes that the corresponding configuration confTh(c, t)
is well-formed according to Definition 8.12.
8.2 Implementation of Kernel Threads
The abstract machine for kernel threads considered above is implemented by the MXA machine
executing a program obtained from pi and the framework for kernel threads.
8.2.1 Framework for Kernel Threads
The kernel threads framework is a program pith =
(
pithµ , pi
th
cil
) ∈ ProgMX containing the imple-
mentation of the aforementioned primitives and all data structures and parameters needed for
it. This program is linked [Tsy09, IdR09] with pi from Section 8.1.1, compiled together, and
placed into the memory.
220
8.2 Implementation of Kernel Threads
8.2.1.1 Data Structures
First, we consider the data structures and constants declared in the C-IL program pithcil.
In the program we fix the number of running kernel threads to the constant NP equal to np.
In turn, the constant TID MAX corresponds to the maximal thread identifiermtid. Moreover, we
denote thread states by the following constants:
#define TS_FREE 0
#define TS_NEW 1
#define TS_RUNNING 2
#define TS_SLEEPING 3
#define TS_FINISHED 4
The C-IL composite type struct TCB represents the thread control block (TCB) containing all
data characterizing a kernel thread:
typedef struct TCB {
u32 tid; // thread ID
u32 state; // thread state
u32 pid; // processor ID
thfun_t fn; // thread entry function
u32 arg; // thread function argument
void *sba; // stack base address
u32 mss; // maximal stack size
thcntx_t cntx; // thread context
} TCB_t;
where the thread context contains values of stack and frame base pointers of the topmost frame
if the corresponding thread is non-running:
typedef struct Thcntx {
void *sp;
void *bp;
} thcntx_t;
The components of the abstract machine configuration are implemented as doubly linked
lists with nodes containing TCBs:
typedef struct Thread {
TCB_t tcb; // thread control block
struct Thread *next; // link to the next node
struct Thread *prev; // link to the previous node
} thread_t;
such that the nodes for all threads available in the kernel are kept in an array threads with
elements numbered by thread identifiers starting from 1:
thread_t threads [TID_MAX+1];
The element with index 0 is not used for simplicity here. Obviously, if a thread is not present,
its state is marked as TS FREE and a new thread can be created with the same identifier.
For modeling the lists we support pointers and arrays of pointers to tails and heads of the
lists:
// Pointer to a global list of new threads
thread_t *new_hd;
thread_t *new_tl;
// Pointer to a global list of free threads
thread_t *free_hd;
thread_t *free_tl;
221
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Listing 8.1: Stack switch procedure in pithµ .
switch_stack USES sv1 sv2 sv3 sv4 sv5 sv6 sv7 sv8
1: asm {
// save sp and bp to the thread context
sw sp i1 0
sw bp i2 0
// restore sp and bp from another thread context
addi sp i3 0
addi bp i4 0
// load argument for the entry function of a new thread
lw i1 bp 28 };
2: ret;
// Pointers to local lists of running and ready for scheduling threads
thread_t *ready_hd [NP+1];
thread_t *ready_tl [NP+1];
// Pointers to local lists of finished threads
thread_t *fin_hd [NP+1];
thread_t *fin_tl [NP+1];
Note again, that we assume the processors to be numbered starting from 1 and, therefore, do
not use the elements with index 0.
For the work on such lists we implement the classical operations (see Appendix B):
• search by tid – search for a list node of a thread with a given identifier,
• search by pid – search for a list node of a thread with a given processor identifier,
• remove – removing a thread list node,
• insert to end – inserting a node to the end of a thread list.
The accesses to the global lists of new and free threads are locked and implemented with the
help of acquiring and releasing the spinlock 3
lock_t lock;
by the C-IL functions (see Appendix A):
void acquire_lock(lock_t *lock);
void release_lock(lock_t *lock);
In order to determine the current thread for each processor, in the program pithcil we also declare
an array of the thread identifiers:
u32 current [NP+1];
8.2.1.2 Implementation of Primitives
The thread switch is based on the stack substitution implemented in the MASM program pithµ
with inline assembly by the procedure switch stack (Listing 8.1)4. The idea of using such a
procedure was taken from [Han96] and the former joint work of the author of this thesis with
Ulan Degenbaev in the Verisoft XT project [The11].
3The verification of operations on locks can be found in [HL09]
4For register names used in MASM programs, please, refer Table 5.1 in Section 5.1.3.
222
8.2 Implementation of Kernel Threads
frame for a caller of 
thread_run
...
frame for  
thread_run
frame for  
switch_stack
Figure 8.2: Layout of the stacks for the running and sleeping threads in switch stack before the
return.
The procedure is declared as external in the C-IL program pithcil:
extern void switch_stack(void **oldsp, void **oldbp, void *newsp, void *newbp);
The first two parameters are the pointers to the fields in the context of the current thread. The
other ones are the values of the stack and frame base pointers of a thread to be scheduled.
Loading the argument is only needed for switching to the stack of a newly created thread
because we have to provide the argument via register i1 according to the compiler calling con-
vention introduced in Section 5.1.3.
Except the MASM procedure get pid (see Appendix A) reading the processor ID from the
corresponding SPR (Table 3.4), all other functions of the framework for kernel threads are im-
plemented in pithcil. Here, we present the implementation of every primitive and describe them
shortly.
The thread switch (Listing 8.2) is performed on the local data structures and comprises the
following steps. One obtains pointers (variables from and to in the code) to the nodes for the
current thread and the one we switched to if it is present among the threads ready for scheduling
(lines 1-4). In case of success, we change the states of the threads and modify the ID of the thread
current for the running processor (lines 6-8). The most tricky part of the thread switch is made
after the call of switch stack (line 9) in the body of this procedure.
First, in the inline assembly (line 1) we substitute the stacks by saving the current content of
the registers sp and bp and restoring them from the context belonging to the thread pointed by
to. Note that at this moment the stack of the thread pointed by from has at least three frames
on the top (Figure 8.2): for the caller of thread run , and the calls of thread run , switch stack
respectively. Its layout will be preserved in the memory until the thread is scheduled again.
Second, by executing ret (line 2) one either returns the control to the sleeping thread we switch
to, or starts executing the entry function of a new thread.
In case of the sleeping thread, we continue the execution of the code of this thread starting
from the C-IL statement return in the function thread run (line 10). This operation is feasible
because the stack of the sleeping thread has also the layout formed by the call of thread run
and depicted on Figure 8.2. Therefore, after the further return from thread run , one restores the
thread’s execution at the next statement after this call.
In case of the new thread, after ret, the compiled code of the prologue of the entry function
is executed. Obviously, the stack of the newly created thread must be prepared properly by the
primitive thread create so that one can correctly return from switch stack .
During the thread creation (Listing 8.3) we access the global lists of free and new threads, and,
therefore, guarantee the atomicity of the operation via acquiring (line 1) and releasing (line 24)
223
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Listing 8.2: Thread switch.
void thread_run(u32 tid)
{
// Local variables:
u32 pid;
u32 ct;
thread_t *from, *to;
// Function body:
1: pid = call get_pid();
2: ct = current[pid];
3: from = &threads[ct];
4: to = call search_by_tid(ready_hd[pid], tid);
5: ifnot (to != 0 && ct != tid) goto 10;
6: (from->tcb).state = TS_SLEEPING;
7: (to->tcb).state = TS_RUNNING;
8: current[pid] = tid;
9: switch_stack(&((from->tcb).cntx.sp), &((from->tcb).cntx.bp),
(to->tcb).cntx.sp, (to->tcb).cntx.bp);
10: return;
}
Listing 8.3: Thread creation.
void thread_create(u32 tid, u32 pid, thfun_t fn, u32 arg, void *sba, u32 mss)
{
// Local variables:
thread_t *th;
i32 *sp;
// Function body:
1: call acquire_lock(&lock);
2: th = call search_by_tid(free_hd, tid);
3: ifnot (th != 0) goto 24;
// − initialization of TCB
4: (th->tcb).tid = tid;
5: (th->tcb).state = TS_NEW;
6: (th->tcb).pid = pid;
7: (th->tcb).fn = fn;
8: (th->tcb).arg = arg;
9: (th->tcb).sba = sba;
10: (th->tcb).mss = mss;
// −− prepare frame part for fn
11: sp = (i32*)((i32)sba - 3); // address for arg
12: *sp = arg; // storing arg on the stack
13: sp = sp + (-1); // space for dummy return address
// −− prepare frame for switch stack
15: sp = sp + (-4); // home addresses for arguments of switch stack
16: sp = sp + (-1); // space for return address
17: *sp = (i32*) fn; // storing address of fn as return address
18: sp = sp + (-1); // space for dummy pbp
19: (th->tcb).cntx.bp = (void*) sp; // save frame base pointer to the thread context
20: sp = sp + (-8); // dummy space for all callee−save registers
21: (th->tcb).cntx.sp = (void*) sp; // save stack pointer to the thread context
// − remove from list of free threads
22: call remove(&free_hd, &free_tl, th);
// − add to list of new threads
23: call insert_to_end(&new_hd, &new_tl, th);
24: call release_lock(&lock);
25: return;
}
224
8.2 Implementation of Kernel Threads
frame part for
entry function
arg
radummy
full frame for
switch_stack
4 dummy
words for
arguments
address of 
entry function
space for all
callee-save
registers
pbpdummy bp
sp
sba
Figure 8.3: Physical stack layout of a newly created thread.
the spinlock. After the successful acquisition of the lock, we check whether the given thread
ID is free (line 2) and then prepare the TCB (lines 4-10) as well as the stack of the new thread
including the stack pointer and the frame base pointer (lines 11-21) so that the thread can be
correctly scheduled. Both pointers are saved into the tread context in the TCB (lines 19 and 21).
The operation finishes by adding the thread node into the list of new threads and removing it
from the list of free threads (lines 22-23).
The physical stack layout of the new thread is depicted on Figure 8.3. Since the thread switch
is performed by returning from switch stack , the stack has the full frame for this procedure
where the return address is the starting address of the entry function. Note that the value of
pbp in this frame will not be used later and, therefore, it is left uninitialized. The prologue of
the entry function will compute the actual frame base pointer on the base of the stack pointer.
Moreover, the entry function is not called explicitly from C-IL. Hence, we also have to prepare
the part of its frame that is supposed to be created by a caller, i.e., allocation of home addresses
for arguments and the space for the return address. Since we never return from the entry func-
tion by the C-IL statement return, the value of the return address does not matter for the correct
execution of the function. The argument, however, is stored on the stack at the home address
so that the procedure switch stack can put it into the register later on during the thread switch.
The thread exit (Listing 8.4) is similar to the thread switch except moving the thread to the list
of finished ones. Note, since a thread initial for a processor is not allowed to exit, the execution
of the function has no effect in such a case (lines 6 and 13).
During the acquisition of newly created threads (Listing 8.5) the global list of new threads is
accessed in a loop. Obviously, the operation requires the protection by the same lock used in
225
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Listing 8.4: Thread exit.
void thread_exit_to(u32 tid)
{
// Local variables:
u32 pid;
u32 ct;
thread_t *from, *to;
// Function body:
1: pid = call get_pid();
2: ct = current[pid];
3: ifnot (ct != pid) goto 13; // do not exit from initial threads
4: from = &threads[ct];
5: to = call search_by_tid(ready_hd[pid], tid);
6: ifnot (to != 0 && ct != tid) goto 13;
7: (from->tcb).state = TS_FINISHED;
8: call remove(&ready_hd[pid], &ready_tl[pid], from);
9: call insert_to_end(&fin_hd[pid], &fin_tl[pid], from);
10: (to->tcb).state = TS_RUNNING;
11: current[pid] = tid;
12: switch_stack(&((from->tcb).cntx.sp), &((from->tcb).cntx.bp),
(to->tcb).cntx.sp, (to->tcb).cntx.bp);
13: return;
}
the thread creation. All new threads with the matching processor affinity are deleted from this
list and added into the list of threads ready for scheduling.
The implementation of the thread deletion working on the local and global lists of finished and
free threads respectively is intuitive and given in Listing 8.6.
8.2.2 Program Linking and Obtaining the Implementation Model
As the next step, we consider how the program pi of the hypervisor / OS system kernel is linked
together with the framework for the kernel threads pith .
For simplicity, we assume that names of declared global variables, and composite types are
distinct in both programs. Moreover, the function and procedure tables are different except the
additional declaration of external primitives in picil.
Definition 8.17 (Program Linking Operator). For the given programs pi = (piµ, picil) and pith =(
pithµ , pi
th
cil
)
we define the program linking operator
link(pi, pith)
def≡ piimp
where the resulting linked program piimp ≡
(
piimpµ , pi
imp
cil
)
is computed as follows:
• linked MASM program
piimpµ (p) = piµ unionmulti pithµ
• linked C-IL program
piimpcil .VG = picil.VG ◦ pithcil.VG
piimpcil .TF = picil.TF unionmulti pithcil.TF
piimpcil .F = picil.F|D′ unionmulti pithcil.F
226
8.2 Implementation of Kernel Threads
Listing 8.5: Acquisition of newly created threads.
void thread_acquire()
{
// Local variables:
u32 pid;
thread_t *th;
thread_t *list;
// Function body:
1: pid = call get_pid();
2: list = new_hd;
3: call acquire_lock(&lock);
4: th = call search_by_pid(list, pid);
5: ifnot (th != 0) goto 10;
6: list = th->next;
7: call remove(&new_hd, &new_tl, th);
8: call insert_to_end(&ready_hd[pid], &ready_tl[pid], th);
9: goto 4;
10: call release_lock(&lock);
11: return;
}
Listing 8.6: Thread deletion.
void thread_delete(u32 tid)
{
// Local variables:
u32 pid;
thread_t *th;
thread_t *list;
// Function body:
1: pid = call get_pid();
2: call acquire_lock(&lock);
3: th = call search_by_tid(fin_hd[pid], tid);
4: ifnot (th != 0) goto 8;
5: (th->tcb).state = TS_FREE;
6: call remove(&fin_hd[pid], &fin_tl[pid], th);
7: call insert_to_end(&free_hd, &free_tl, th);
8: call release_lock(&lock);
9: return;
}
227
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
where the domain D′ ≡ dom (picil) \ Fprimname is restricted to all declared as well as imple-
mented in picil functions excluding the primitives because they are present in the function
table of the framework.
A more general form of linking operator for any C0 programs can be found in [Tsy09, IdR09].
Since the concurrent MXA machine implementing the abstract machine for kernel threads
executes the linked program piimp , its environment parameters θimp ∈ ParamsCIL are computed
wrt. this program and placing the compiled code into the memory. Obviously, we require that
θ can be fully obtained from θimp :
validparTh (θ, θimp)
def≡
(i) ∀X ∈ {size t, cast , intrinsics} . θ.X = θimp .X
(ii) ∀v ∈ dom (θ.allocgvar ) . θ.allocgvar (v) = θimp .allocgvar (v)
(iii) ∀f ∈ dom (θ.Fadr ) . θ.Fadr (f) = θimp .Fadr (f)
(iv) ∀t ∈ dom (θ.sizestruc) . θ.sizestruc(t) = θimp .sizestruc(t)
(v) ∀t ∈ dom (θ.offsetstruc) . θ.offsetstruc(t) = θimp .offsetstruc(t)
The whole compiled program is placed at the same base addresses cba = (cbacil, cbaµ) such
that validcodeMX (infoimp , cba) holds for the full compiler information infoimp = (info
imp
cil , info
imp
µ ) ∈
infoT MX, computed as info
imp
µ = cmplMASM(pi
imp
µ ) and info
imp
cil = cmplCIL(pi
imp
cil ). Similarly to
θimp and θ above, we introduce the predicate
validcmplTh (pi, pith , piimp) ∈ B
with requirements on the compilation of pi and piimp such that (i) the compiled codes of the
kernel and the framework for kernel threads are placed after each other
infoimpcil .code = infocil.code ◦ cmplCIL(pithcil).code
infoimpµ .code = infoµ.code ◦ cmplMASM(pithµ ).code
and (ii) all other components of info are included into infoimp . The formal definition is intuitive
and we do not provide it here for brevity.
Definition 8.18 (Memory Addresses Occupied by Global Variables of the Framework for Ker-
nel Threads). Now, having pith and θimp one can compute all memory byte addresses of the
global variables declared in pith (see Section 8.2.1.1) and used for the implementation of the
kernel threads. We denote this set by
Adatath (pith , θimp) ⊂ B32
and do not provide its formal definition here for brevity.
In order to determine other parameters for the extended mixed machine, we introduce auxil-
iary functions computing base addresses of elements of arrays and fields of structures.
Definition 8.19 (Base Addresses of an Array’s Element). For a declared array variable (v, t) ∈
V × TQ with qt2t(t) = array(t′, n) for some n ∈ N we define the base address of an array’s
element with index i ∈ Nn:
ba
θimp
elem(v, t
′, i)
def≡ θimp .allocgvar (v) +32
(
i · sizeθimp (t′)
)
32
228
8.3 Sequential Correctness in Concurrent Setting
Definition 8.20 (Base Addresses of a Field). Given an address a ∈ B32 of a composite type
t ∈ TC , we compute the base address of its field f ∈ F as
ba
θimp
field(a, t, f)
def≡ a+32 (θimp .offsetstruc(t, f))32
Therefore, the stack information abstraction is instantiated according to the the linked pro-
gram as follows:
• Base address of the stack information: StIbas ∈ (B32)np such that
StIbasi = ba
θimp
elem(threads,Thread , i)
where threads ∈ decl(piimp .VG) is a global variable and Thread ∈ TC is the composite type
in the program.
• Size of a single stack information region in bytes:
StIsize = sizeθimp (Thread)
• Address of the next stack information region:
StIbanext(a,m) =
{
anext : anext 6= 032
⊥ : otherwise
where the retrieved address anext is computed as
anext ≡ m4
(
ba
θimp
field(a,Thread ,next)
)
• Partial functions retrieving the stack base address and maximal stack size:
sba(a,m) = m4
(
ba
θimp
field(atcb ,TCB , sba)
)
mss(a,m) =
〈
m4
(
ba
θimp
field(atcb ,TCB ,mss)
)〉
where atcb is the address of the TCB field in the structure Thread computed as
atcb ≡ baθimpfield(a,Thread , tcb)
8.3 Sequential Correctness in Concurrent Setting
As we have seen above, since for given pi and pith we can easily compute the linked program
and the corresponding compiler information, for brevity the following shorthands depending
on pi will be often used without explicit mentioning them as parameters in definitions till the
end of the chapter:
piimp ≡ link(pi, pith)
info ≡ (infocil, infoµ) ≡ (cmplCIL(picil), cmplMASM(piµ))
infoimp ≡ (infoimpcil , infoimpµ ) ≡
(
cmplCIL(pi
imp
cil ), cmplMASM(pi
imp
µ )
)
In turn, explicitly, we will operate with pi, θ, θimp , and some abstract and implementation
configurations c ∈ CTh and cimp ∈ CMXA as well at their components.
Additionally, we will also often refer to the shorthands cmxa, cmx, ccil, and stmtcil computed
for c ∈ CTh in Definition 8.13.
229
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
8.3.1 Consistency Points
The consistency points of the machine for kernel threads executing the program pi are easily
defined on the base of the MXA thread configuration obtained for the current kernel thread.
Definition 8.21 (Consistency Points of Machine for Kernel Threads). Given k ∈ KprocTh , pi, θ,
and cba from above, we compute
cpTh(k, pi, θ, cba)
def≡ cpMXA
(
conf curTKMXA(k), pi, info, θ, cba
)
For c ∈ CTh we simply reload the definition as
cpTh(c, pi, θ, cba)
def≡ cpTh(c.k, pi, θ, cba)
Recall that using Definition 6.34 we already required the C-IL compiler to insert the consis-
tency point directly before any function call. Therefore, as long as the location counter points to
the call of a primitive, the current thread stays in the consistency point.
For the consistency points of the MXA machine implementing the kernel threads we will
choose only those MXA consistency points from Definition 7.36 at which the simulation relation
between the MXA and the machine for kernel threads will hold. The natural candidates for
them are the consistency points in the program pi, at every ISA step executing any compiled code
of libraries used by the kernel, guest/user steps, and the IO-points inside the implementation
of the primitives in pith , except the cas call in the first iteration during the lock acquisition.
Definition 8.22 (Consistency Points for MXA Machine Implementing Kernel Threads). Given
an MXA thread configuration kimp ∈ KMXA in the machine implementing the kernel threads,
we define whether the thread at a consistency point by the predicate
cpThMXA(kimp , pi, θimp , cba)
def≡
(i) isa(kimp) =⇒
outside(kimp .core, infoimp , cba)∨
inline(kimp .core, piµ, infoµ, cbaµ)∨
cpMXrMIPS(kimp .core, pi, info, θ, cba)
(ii) mx(kimp) =⇒
(a) masm(acimp) =⇒ ptop(acimp .stack) ∈ dom (piµ)
(b) cil(acimp) =⇒ infocil.cp(ftop(acimp), loctop(acimp))∨
cpacqcasMXA (kimp) ∨ cprelvolMXA (kimp)
where
• acimp – the active context of the MXA thread
acimp ≡ kimp .kmx.ac
• cpacqcasMXA (kimp) – the consistency point inside the spinlock acquisition function
cpacqcasMXA (kimp)
def≡ ftop(acimp) = acquire lock ∧ loctop(acimp) = 3 ∧ 〈it〉 6= 1
such that
it ≡ bytes2bits(readlm(ME top(acimp), i, 0, 4))
is a bit-string value of the local variable i indicating whether the first iteration with the
call of cas in the spinlock is being performed.
230
8.3 Sequential Correctness in Concurrent Setting
• cprelvolMXA (kimp) – the consistency point inside the spinlock release
cprelvolMXA (kimp)
def≡ ftop(acimp) = release lock ∧ loctop(acimp) = 1
Since in the compiler correctness for the MX machine we already required the cas calls and ac-
cesses to volatile to be IO-points, we refer here explicitly the functions acquiring and releasing
locks and the corresponding locations inside them (see Appendix A).
For a configuration cimp ∈ CMXA of the MXA machine we also provide the shorthand
cpThMXA(cimp , pi, θimp , cba)
def≡ cpThMXA(cimp .k, pi, θimp , cba)
8.3.2 Well-Formed Implementation
Before we consider the simulation relation between the sequential machine for threads and
the MXA implementing them, we pay attention to the well-formedness of the MXA machine.
In particular, since the stacks of threads must have specific layout allowing the thread switch
operation, we will define it formally here. Obviously, such layout will be preserved by the
implementation and under some software conditions.
First, we introduce a few auxiliary definitions for operating on the data structures in the MXA
machine.
For retrieving the values of variables and elements/fields of the data structures from the
framework for kernel threads, we will not use the C-IL expression evaluation. Instead, we will
get their bit-string values directly from the memory of the MXA machine. The reason is that
a corresponding C-IL configuration is not always available. Moreover, the correctness criteria
and required conditions can be easier stated without considering the typed C-IL values. So,
given the MXA machine memoryM and a thread ID t ∈ Nmtid, we define the shorthands:
• base address of the TCB
ba
θimp
TCB (M, t) ≡ baθimpfield
(
ba
θimp
elem(threads,Thread , t),Thread , tcb
)
• bit-string value of a TCB field f ∈ F
va
θimp
TCB (M, t, f) ≡M4
(
ba
θimp
field
(
ba
θimp
TCB (M, t),TCB , f
))
• thread ID, state, affinity, argument, and stack size X ∈ {tid, state, pid, arg,mss}
Xθimp (M, t) ≡
〈
va
θimp
TCB (M, t,X)
〉
• the address of the thread entry function
ad
θimp
fn (M, t) ≡ vaθimpTCB (M, t, fn)
• stack base address
sbaθimp (M, t) ≡ vaθimpTCB (M, t, sba)
• base address of the thread context
ba
θimp
cntx (M, t) ≡ baθimpfield
(
ba
θimp
TCB (M, t),TCB , cntx
)
231
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
• stack pointer and frame base pointer
spθimp (M, t) ≡M4
(
ba
θimp
field
(
ba
θimp
cntx (M, t),Thcntx , sp
))
bpθimp (M, t) ≡M4
(
ba
θimp
field
(
ba
θimp
cntx (M, t),Thcntx , bp
))
For a given processor ID pid ∈ Nnp we denote the index of the current thread obtained from
the memoryM as
ctθimp (M, pid) ≡
〈
M4
(
ba
θimp
elem(current ,u32, pid)
)〉
The identifier of the processor accessing the shared resources is computed by
apθimp (M) ≡ 〈M4 (θimp .allocgvar (lock))〉
Along with computation of the fields from the TCB of a given thread, we also collect all thread
identifiers for lists of ready, new and free threads declared in pith .
Definition 8.23 (Sets of Thread Identifiers from Lists). Given an address a ∈ B32 ∪ {⊥} of the
node of the list, we define TIDsθimp (a,M) ∈ 2Nmtid inductively as
TIDs(a,M) def≡
{
∅ : a = ⊥ ∨ a = 032
{tid} ∪ TIDsθimp (StIbanext(a,M),M) : otherwise
where the thread ID is computed as above
tid ≡
〈
M4
(
ba
θimp
field (baTCB ,TCB , tid)
)〉
baTCB ≡ baθimpfield (a,Thread , tcb)
Hence, for X ∈ {new , free}we easily compute
TIDs
θimp
X (M)
def≡ TIDsθimp (M4 (θimp .allocgvar (X hd)) ,M)
In turn, for Y ∈ {ready ,fin} and pid ∈ Nnp we have
TIDs
θimp
Y (M, pid)
def≡ TIDsθimp
(
M4
(
ba
θimp
elem(Y hd ,ptr(thread t), pid)
)
,M
)
The well-formedness of the stack and data structures implementing a kernel thread depends
on the state of this thread.
Sleeping Thread Recall that a thread switch is performed when a current kernel thread rep-
resented by an MXA thread in the MXA machine executes the function thread run which, in
turn, calls the MASM procedure switch stack .
According to Listing 8.1 the MX stack abstraction of the current thread is lost at the beginning
of the inline assembly (line 1) because of switching to the consistent MIPS-86 configuration.
If the physical stack residing in the memory is not modified later, we can reconstruct almost
the same MX thread configuration when this thread is scheduled again. The only components
that can differ are the location counter in the topmost frame and the values of GPRs and SPRs
232
8.3 Sequential Correctness in Concurrent Setting
because they are not saved on the stack or in TCB. The reconstruction is performed according
to the MXA semantics before returning from switch stack (line 2).
In order to show the correctness of the thread switch we will require that this reconstruction
on the base of the stack residing in the memory and the TCB fields is always possible for any
sleeping thread.
First, we consider which reconstructed MX thread configurations we are interested in.
Definition 8.24 (Reconstructed MX Thread Configuration Well-Formed for Sleeping Thread).
For kmx ∈ KMX we require that (i) the active context is represented by a well-formed MASM
frame for switch stack with the location at the return from the procedure, and (ii) the list of
inactive contexts has a C-IL context on its top where the stack contains at least two frames. (iii)
Its well-formed topmost frame belongs to the function thread run with the location after the call
of switch stack . Using ictop ≡ kmx.ic[|kmx.ic|] we define
wf sleepKMX (kmx, θimp)
def≡
(i) masm(kmx.ac) ∧ kmx.ac.stack = fr
(ii) cil(ictop) ∧ |ictop | ≥ 2
(iii) ictop [|ictop |] = fr ′
such that the frame of the active context satisfies the conditions
fr .p = switch stack fr .loc = 2 |fr .pars| = 4 fr .lifo = ε
dom (fr .saved) = #Regcallee
and the frame fr ′ has the properties
fr ′.f = thread run fr ′.rds = ⊥ fr ′.loc = 10
dom
(
fr ′.ME
)
= decl
(
pithcil.F(thread run).V
)
∀ (v, t) ∈ pithcil. F(thread run).V. |fr ′.ME(v)| = sizeθimp (qt2t(t))
Note that the values of the arguments and local variables do not matter because they are not
used during scheduling the sleeping thread. The same applies to the content of the general and
special purpose registers in kmx.
As the next step, we have to choose the MIPS ISA configuration for which we will state that
the reconstruction of the MX thread conforming wf sleepKMX is always possible. Obviously, as long
as the thread remains sleeping, the MXA configuration is irrelevant except for the memory
region with the TCB and the stack of this thread. Fortunately, according to the wf sleepKMX we do
not have to care about the irrelevant parts of the configuration and can easily compose the ISA
configuration needed for the reconstruction.
Definition 8.25 (ISA Configuration for the MX Thread Reconstruction of the Sleeping Thread).
Given an identifier t ∈ Nmtid of a sleeping thread and the memory configurationM of the MXA
machine we define
conf sleepMIPS(t,M, piimp , θimp , cba)
def≡ (cpu,M)
such that cpu ∈ Cproc satisfies the following conditions:
cpu.core.pc = caMASM
(
switch stack , 2, infoimpµ , cbaµ
)
233
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
cpu.core.gpr(sp) = spθimp (M, t)
cpu.core.gpr(bp) = bpθimp (M, t)
All other components of the processor are either not involved into the reconstruction or irrele-
vant and, therefore, may have arbitrary configurations.
Definition 8.26 (Well-Formed TCB and Stack for Sleeping Thread). Finally, we can define
that the TCB and the stack residing in the the memory M of the MXA machine implement-
ing the kernel threads are well-formed for a sleeping thread t ∈ Nmtid with stateθimp (M, t) =
TS SLEEPING.
wf sleepMXA(t,M, piimp , θimp , cba)
def≡
∃kmx ∈ KMX.
(i) kmx = R
piimp ,θimp
KMX
(
d, infoimp , cba, sba
θimp (M, t))
(ii) wf sleepKMX (kmx, θimp)
with d ≡ conf sleepMIPS(t,M, piimp , θimp , cba).
New Thread The well-formedness of the TCB and the physical stack of any new thread reflects
the physical stack layout on Figure 8.3.
Definition 8.27 (Well-Formed TCB and Stack for New Thread). For a new thread with an
identifier t ∈ Nmtid such that stateθimp (M, t) = TS NEW holds we require that (i) the argument
passed to its entry function is stored on the bottom of the physical stack, (ii) the return address
in the topmost frame is the address of the entry function, and (iii) – (iv) the stack base pointer
and the frame base pointer in the thread context are set properly.
wf newMXA(t,M, θimp)
def≡
(i) argθimp (M, t) = 〈M4 (adbot)〉
(ii) adθimpfn (M, t) = ra
(M, bpθimp (M, t))
(iii) bpθimp (M, t) = adbot −32 (4 · 7)32
(iv) spθimp (M, t) = bpθimp (M, t)−32 (4 ·#Regcallee)32
where adbot ≡ sbaθimp (M, t)−32 332 is the address of the word on the bottom of the stack (or the
first item).
Running Thread In this case we define the well-formedness on the whole configuration of
the sequential MXA machine that must hold at consistency points from Definition 8.22.
Let pid(cimp) be the processor identifier in a configuration cimp ∈ CMXA retrieved as
pid(cimp) ≡
{
〈cimp .k.kmx.spr(pid)〉 : mx(cimp)
〈cimp .k.core.spr(pid)〉 : isa(cimp)
Definition 8.28 (Well-Formed MXA Configuration for Running Thread). For a running (or
current) thread with an identifier t ∈ Nmtid such that stateθimp (M, t) = TS RUNNING holds and
a configuration cimp of the MXA machine encoding this thread we require that (i) the MXA con-
figuration is well-formed, (ii.a) before the lock acquisition in acquire lock and at the C-IL calls
234
8.3 Sequential Correctness in Concurrent Setting
of primitives implemented with the lock protection the lock is not acquired by the considered
processor yet, and (ii.b) at the moment of the lock release it belongs to this processor.
wf runMXA(t, cimp , pi, piimp , θimp , cba)
def≡
(i) wfconf piimp ,θimp ,cbaMXA (cimp)
(ii) mx(cimp .k) ∧ cil(acimp) =⇒
(a) cpacqcasMXA (cimp .k) ∨ infocil.cp (ftop(acimp), loctop(acimp)) =⇒
apθimp (cimp .M) 6= pid(cimp)
(b) cprelvolMXA (cimp .k) =⇒ apθimp (cimp .M) = pid(cimp)
with acimp ≡ cimp .k.kmx.ac.
Note that in (ii.a) for brevity we consider all consistency points in the C-IL program of the
hypervisor / OS kernel though we are only interested in those at which the mentioned functions
are called.
MXA Machine Implementing Kernel Threads Finally, in order to define the well-formedness
of an MXA machine configuration for kernel thread implementation, we consider it separately
for the local lists belonging to the processor and the global ones as declared in pith .
Definition 8.29 (Well-Formed MXA Configuration for All Processor’s Threads). Then, the
well-formedness of cimp ∈ CMXA for all processor’s threads requires that (i) – (ii) the current
thread has the proper state in its TCB and cimp is well-formed for it, (iii) all other threads from
the list of ready ones have the states of sleeping or new threads and their TCBs as well as
physical stacks in the memory are well-formed, (iv) the list of finished threads contains only
finished threads.
Using ct ≡ ctθimp (M, pid(cimp)) we formally define
wf procMXA(cimp , piimp , θimp , cba)
def≡
(i) stateθimp (cimp .M, ct) = TS RUNNING
(ii) wf runMXA(ct, cimp , piimp , θimp , cba)
(iii) ∀t ∈ TIDsθimpready(cimp .M, pid(cimp)) \ {ct}.
(a) stateθimp (cimp .M, t) ∈ {TS SLEEPING, TS NEW}
(b) stateθimp (cimp .M, t) = TS SLEEPING =⇒
wf sleepMXA(t, cimp .M, piimp , θimp , cba)
(c) stateθimp (cimp .M, t) = TS NEW =⇒
wf newMXA(t, cimp .M, θimp)
(iv) ∀t ∈ TIDsθimpfin (cimp .M, pid(cimp)).
stateθimp (cimp .M, t) = TS FINISHED
Since the access to the global lists of new and free threads is lock protected and is only pos-
sible when the lists are unlocked or locked by the considered processor, we define the well-
formedness for them only under these conditions.
Definition 8.30 (Well-Formed MXA Configuration for New and Free Threads). The well-
formedness of a configuration cimp in this case assumes that all threads in both global lists
235
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
have proper states in their TCBs, and the TCBc and stacks for new threads are well-formed.
wf
free/new
MXA (cimp , θimp)
def≡
apθimp (cimp .M) ∈ {0, pid(cimp)} =⇒
(i) ∀t ∈ TIDsθimpnew (cimp .M).
stateθimp (cimp .M, t) = TS NEW ∧ wf newMXA(t, cimp .M, θimp)
(ii) ∀t ∈ TIDsθimpfree (cimp .M).
stateθimp (cimp .M, t) = TS FREE
Definition 8.31 (Well-Formed MXA Configuration for Kernel Threads Implementation). Fi-
nally, we combine Definitions 8.29 and 8.30 into the well-formedness of MXA machine configu-
rations for kernel thread implementation as
wfconf ThMXA(cimp , piimp , θimp , cba)
def≡
(i) wf procMXA(cimp , piimp , θimp , cba)
(ii) wf free/newMXA (cimp , θimp)
(iii) validcpMX(piimp , infoimp , θimp)
The last condition is in fact needed for proving the correctness of the switch to a new thread
where the simulation relation must hold at the beginning of the body of the thread entry func-
tion.
8.3.3 Sequential Simulation Relation
First, we separately define the consistency relation for each kind of kernel threads.
Definition 8.32 (Consistency Relation for Sleeping Thread). Given a configuration th ∈ KnrunTh
of a kernel thread with an identifier t ∈ Nmtid and the memoryM of the MXA machine imple-
menting the kernel threads such that the thread is sleeping, i.e., stateθimp (M, t) = TS SLEEPING
holds, we introduce the simulation relation consissleepTh (th, t,M, pi, θimp , cba) stating that the ac-
tive and inactive contexts in th are equal to the corresponding parts of an existing MX thread
configuration, and other components of th are coupled with the fields in the TCB.
Formally, for kmx ∈ KMX existing by Definition 8.26 and computed as
kmx ≡ Rpiimp ,θimpKMX
(
d, infoimp , cba, sba
θimp (M, t))
d ≡ conf sleepMIPS(t,M, piimp , θimp , cba)
and the shorthands
ni ≡ |kmx.ic| ictop ≡ kmx.ic[ni]
we define
consissleepTh (th, t,M, pi, θimp , cba)
def≡
(i) th.ic = kmx.ic[1 : ni− 1]
(ii) th.ac = ictop [1 : top(ictop)− 1]
(iii) t = tidθimp (M, t)
(iv) th.pid = pidθimp (M, t)
(v) th.sba = sbaθimp (M, t)
(vi) th.mss = mssθimp (M, t)
236
8.3 Sequential Correctness in Concurrent Setting
In contrast to the sleeping threads that can run again on the processor, the finished threads
are never scheduled. Therefore, we are not interested in their stack abstraction and can skip the
coupling relation for it.
Definition 8.33 (Consistency Relation for Finished Thread). Given a configuration th ∈ KnrunTh
of a finished thread t ∈ Nmtid and the MXA machine memory M such that stateθimp (M, t) =
TS FINISHED holds, we define the simulation relation
consisfinTh(th, t,M, θimp)
def≡
(i) t = tidθimp (M, t)
(ii) th.pid = pidθimp (M, t)
(iii) th.sba = sbaθimp (M, t)
(iv) th.mss = mssθimp (M, t)
Definition 8.34 (Consistency Relation for New Thread). Given a new thread th ∈ KnrunTh with
an identifier t ∈ Nmtid and the memoryM such that stateθimp (M, t) = TS NEW holds, the simu-
lation relation consisnewTh (th, t,M, pi, θ, θimp , cba) specifies that the function name and the argu-
ment in the frame of the active context as well as all other components of th not belonging to
the stack abstraction are coupled with the corresponding fields in the TCB of the given thread.
Moreover, we include the well-formedness of the new thread from Definition 8.11. Though this
well-formedness is required for all threads in new in Definition 8.12, it is formally lost after the
thread is moved to ready (see Definition 8.10) and we need to keep it here for the correctness
proof.
Using the shorthands st ≡ th.ac and (v, t) ≡ piimp .F(ftop(st)).V [1] we formally define
consisnewTh (th, t,M, pi, θ, θimp , cba)
def≡
(i) ftop(st) = 
{
f ′ ∈ Fname
∣∣∣ θimp .Fadr (f ′) = adθimpfn (M, t)}
(ii)
〈
bytes2bits
(ME top(st)(v))〉 = argθimp (M, t)
(iii) t = tidθimp (M, t)
(iv) th.pid = pidθimp (M, t)
(v) th.sba = sbaθimp (M, t)
(vi) th.mss = mssθimp (M, t)
(vii) wfthpi,θnew (th)
The simulation relation for the current thread depends on whether it performs MXA machine
steps or executes a primitive. Since the semantics of the primitives implemented with locks
is split into phases, we must support the simulation relation at consistency points inside their
function bodies. Recall that during any primitive execution we do not create a frame in the
configuration of the current thread. In the implementation, however, we get two additional
frames for the special function call and the lock acquire/release operation. If such a primitive
has parameters and the implementation MXA machine is at a consistency point inside the lock
acquisition function, one should couple the local parameters in the frame of the primitive with
the values of the arguments passed during the call by the current thread. Moreover, the loca-
tion in the caller’s frame in the implementation is one position ahead in comparison with the
configuration of the thread. All these facts are taken into account in the following definition.
237
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Definition 8.35 (Consistency Relation for Running Thread). Given a configuration c ∈ CTh of
the sequential machine for kernel threads and a configuration cimp ∈ CMXA of the implementa-
tion machine, we define the coupling relation consisrunTh (c, cimp , pi, θimp , cba) specifying how the
MXA machine encodes the current kernel thread.
Namely, we require that (i) the processor configurations are fully coupled during inline as-
sembly steps in pi or user/guest transitions, (ii) if no primitive is called during MX machine
steps, the configuration of the running kernel thread is equal to the configuration of the MXA
thread, (iii) the same holds before the call of the thread switch or exit. (iv) In case of any other
primitive we distinguish whether the call is already performed in the implementation or not
and couple the configurations wrt. the discussion above. (v) Especially, as mentioned before, if
thread create or thread delete is already called in the MXA machine, we couple their parameters
on the stack with the arguments in the call of the primitive. (vi) – (viii) Obviously, the thread
identifier, stack base pointer, and frame base pointer in the thread configuration should be equal
to the fields in its TCB.
Let the stacks of active context in c and cimp be denoted by
st ≡ thcur (c).kmx.ac stimp ≡ cimp .k.kmx.ac
and the difference between the lengths of these stacks be computed as
dif ≡ top(stimp)− top(st)
Moreover, let c′imp be a MXA machine configuration obtained from cimp by dropping dif top-
most frames from the stack of the active context as
c′imp .k.kmx.ac = stimp [1 : top(st)]
Then, using additional shorthands st′imp ≡ stimp [1 : top(stimp)− 1], (vi, ti) ≡ Fθpicil(f).V [i], and
stmtcil, ccil from Definition 8.13, we formally define the simulation relation as
consisrunTh (c, cimp , pi, θimp , cba)
def≡
(i) thcur (c) ∈ KisaTh =⇒ thcur (c).cpu = cimp .k
(ii) thcur (c) ∈ KmxTh ∧ ¬primpi,θTh (c) =⇒ thcur (c) = cimp .k
(iii) primpi,θTh (c) ∧ stmtcil = call f(E)∧
f ∈ {thread run, thread exit to} =⇒ thcur (c) = cimp .k
(iv) primpi,θTh (c) ∧ stmtcil = call f ′(E)∧
f ′ ∈ Fprimname \ {thread run, thread exit to} =⇒
(dif = 0 ∧ thcur (c) = cimp .k) ∨(
dif = 2 ∧ ftop(st′imp) = f ′ ∧ thcur (inccurloc (c)) = c′imp .k
)
(v) primpi,θTh (c) ∧ stmtcil = call f(E)∧
f ∈ {thread create, thread delete} ∧ cpacqcasMXA (cimp .k) =⇒
∀i ∈ N|E|. val2bytes
(
[[E[i]]]picil,θccil
)
=ME top(st′imp)(vi)
(vi) c.k.ct = tidθimp (M, c.k.ct)
(vii) thcur (c).sba = sbaθimp (M, c.k.ct)
(viii) thcur (c).mss = mssθimp (M, c.k.ct)
Finally, we can define the consistency relation for all threads belonging to the processor.
238
8.3 Sequential Correctness in Concurrent Setting
Definition 8.36 (Consistency Relation for Processor’s Threads). Given configurations c ∈ CTh
and cimp ∈ CMXA of the sequential machine for kernel threads and the MXA machine imple-
menting it as well as pi, θ, θimp , and cba considered above, we define the simulation relation
for the processor’s threads c.k claiming that (i) the current thread identifier resides in the cor-
responding element in the array current of the framework pith , (ii) the sets of identifiers of all
threads ready for scheduling or scheduled are equal in the specification and the implemen-
tation, (iii) – (iv) the corresponding consistency relations for the current, new, and sleeping
threads belonging to the processor hold, and (v) – (vi) the same finished threads covered by the
consistency relation are considered in both machines.
Using p ≡ pidcur (c) we define
consisprocTh (c, cimp , pi, θ, θimp , cba)
def≡
(i) c.k.ct = ctθimp (cimp .M, p)
(ii) dom (c.k.ready) = TIDsθimpready(cimp .M, p)
(iii) consisrunTh (c, cimp , pi, θimp , cba)
(iv) ∀t ∈ dom (c.k.ready) \ {c.k.ct}.
(a) stateθimp (cimp .M, t) = TS SLEEPING =⇒
consissleepTh (c.k.ready(t), t, cimp .M, pi, θimp , cba)
(b) stateθimp (cimp .M, t) = TS NEW =⇒
consisnewTh (c.k.ready(t), t, cimp .M, pi, θ, θimp , cba)
(v) dom (c.k.fin) = TIDsθimpfin (cimp .M, p)
(vi) ∀t ∈ dom (c.k.fin) .
consisfinTh(c.k.fin(t), t, cimp .M, θimp)
Note that stateθimp (cimp .M, c.k.ct) = TS RUNNING is a part of the well-formedness in Defini-
tion 8.29 and we do not repeat it here.
As for the components new and free of the machine configuration c ∈ CTh , we can couple
them with the implementation only when we can access them according to the semantics of the
machine for kernel threads.
Definition 8.37 (Consistency Relation for New and Free Threads). If the condition c.ap =
0 ∨ c.ap = pidcur (c) holds, we can require that (i) the free identifiers of threads are equal in c
and cimp , and (ii) – (iii) the same newly created threads satisfying the simulation relation from
Definition 8.34 are present in both machines.
consis
free/new
Th (c, cimp , pi, θ, θimp , cba)
def≡
c.ap = 0 ∨ c.ap = pidcur (c) =⇒
(i) c.free = TIDsθimpfree (cimp .M)
(ii) dom (c.new) = TIDsθimpnew (cimp .M)
(iii) ∀t ∈ dom (c.new) .
consisnewTh (c.new(t), t, cimp .M, pi, θ, θimp , cba)
In the definition of the memory consistency we have to exclude the memory regions occupied
by stacks having the counterparts in the configuration of the machine for kernel threads.
239
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Definition 8.38 (Memory Addresses Occupied by Stacks of Processor’s Threads). Given a
configuration k ∈ KprocTh we define the set Astacksproc (k) of memory byte addresses occupied by
stacks for which the corresponding MX stack abstraction is present in k.
Let D(k) be the set of identifies for threads with MX stack abstraction
DTh(k)
def≡ dom (k.fin) ∪ dom (k.ready) \
{
{k.ct} : thcur (k) ∈ KmxTh
∅ : thcur (k) ∈ KisaTh
Then, using the shorthand
th(t) ≡ (dom (k.fin) unionmulti dom (k.ready)) (t)
we formally define
Astacksproc (k)
def≡
⋃
t∈DTh(k)
AstackMX (th(t).sba, th(t).mss)
Analogously, we define the set of addresses for the stacks of the new threads.
Definition 8.39 (Memory Addresses Occupied by Stacks of New Threads). Given a configu-
ration c ∈ CTh , we compute
Astacksnew (c)
def≡
⋃
t∈dom(c.new)
AstackMX (c.new(t).sba, c.new(t).mss)
Now, using Definitions 8.18, 8.38, and 8.39 we define the memory consistency for the sequen-
tial machine for kernel threads in the concurrent setting. Recall that according to the semantics
of kernel threads (see MXA machine steps in Section 8.1.2.2 and the well-formedness in Defi-
nitions 7.5, 8.12) the compiled code of the hypervisor / OS kernel program pi is visible in the
memory of the machine for kernel threads. In turn, the MXA machine implementing the kernel
threads contains in the memory the compiled code of the whole linked program piimp . Therefore,
the memory region where the code of pi resides should be coupled by the simulation relation.
Definition 8.40 (Memory Consistency for Sequential Machine for Kernel Threads). For con-
figurations c ∈ CTh , cimp ∈ CMXA, other parameters as above, and memory addresses icm ⊂
Ahyp \ AcodeMX (info, cba), the memory consistency relation consismemTh (c, cimp , pi, θimp , cba, icm) re-
quires that the content of the memory in both machines is equal except the regions occupied by
the compiled code of the framework pith , all global variables declared in pith , the stacks having
consistent counterparts in c, and the region icm
Computing the memory addresses occupied by the compiled code of the framework pith
Acode ≡ AcodeMX (infoimp , cba) \AcodeMX (info, cba)
and the stacks in c
Astacks ≡ Astacksproc (c.k) ∪Astacksnew (c)
we formally define
consismemTh (c, cimp , pi, θimp , cba, icm)
def≡
∀a ∈ B32 \ (Acode ∪Adatath (pith , θimp) ∪Astacks ∪ icm) . c.M(a) = cimp .M(a)
Finally, we can define the simulation relation for the full machine for kernel threads.
240
8.3 Sequential Correctness in Concurrent Setting
Definition 8.41 (Consistency Relation for Sequential Machine for Kernel Threads). Given
configurations c ∈ CTh , cimp ∈ CMXA of the sequential machine for kernel threads and the
implementation MXA machine, a program pi of the hypervisor / OS kernel, environment pa-
rameters θ, θimp , code base addresses cba = (cbacil, cbaµ), and a set icm ⊂ Ahyp \AcodeMX (info, cba)
of memory addresses that cannot be covered by the memory consistency due to environment
steps, we define the simulation relation between these machines as
consisTh(c, cimp , pi, θ, θimp , cba, icm)
def≡
(i) consisprocTh (c, cimp , pi, θ, θimp , cba)
(ii) c.ap = apθimp (cimp .M)
(iii) consis free/newTh (c, cimp , pi, θ, θimp , cba)
(iv) consismemTh (c, cimp , pi, θimp , cba, icm)
(v) validparTh (θ, θimp) ∧ validcmplTh (pi, pith , piimp)
Note that (v) is included into the simulation relation because it is rather a property of the com-
piler than an assumption for the linker placing the compiled code into the memory.
8.3.4 Accessed Addresses
The set of addresses read and written during steps of the machine for kernel threads are com-
puted similarly to Definition 7.39. The only difference is that we do not take into account the
addresses for stack information retrieving (see Definition 7.38). In contrast to Definition 7.39
where in case of the end of ISA steps we search (in the memory) for the base address and the
maximal size of the current stack, here, we only test whether the stack pointer and the frame
base pointer belong to the stack of the current thread.
Definition 8.42 (Memory Addresses Accessed for Reading and Writing during Steps of Se-
quential Machine for Kernel Therads). For a configuration c ∈ CTh and an input in ∈ ΣMXA
we use the following shorthands
cmxa ≡ conf curTMXA(c) cmx ≡ conf MXAMX (cmxa)
d ≡ conf MXArMIPS(cmxa) d′ ≡ δrMIPS(d, in)
d̂ ≡ conf startrMIPS(cmxa, in)
în ≡ (core,  Cwalk,  Cwalk, 0256) d̂′ ≡ δrMIPS(d̂, în)
and define the set of read addresses readspi,θTh (c, in, cba) obtained as (i) the MX reads-set for
a mixed machine step, and (ii) in case of the switch to inline assembly or a pure ISA step,
computed as (a) the reads-set of the MIPS-86 model if there is no end of ISA steps or the end
of ISA steps is reached and the stack pointer and the frame base pointer do not match the stack
of the current thread, and (b) if they do, readspi,θTh (c, in, cba) additionally includes the memory
241
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
addresses occupied by this stack:
readspi,θTh (c, in, cba)
def≡
readsMX (cmx, pi, θ) : mx(cmxa.k) ∧ ¬startpiisa(cmxa)
readsMIPS
(
d̂, în
)
: startpiisa(cmxa) ∧
(¬endpi,θ,cbaisa (d̂′.cpu)∨
endpi,θ,cbaisa (d̂
′.cpu) ∧ ¬matchstcur (c, d̂′)
)
readsMIPS
(
d̂, în
)
∪ : startpiisa(cmxa) ∧ endpi,θ,cbaisa (d̂′.cpu)∧
AstackMX (sbacur (c),msscur (c)) matchstcur (c, d̂
′)
readsMIPS (d, in) : isa(cmxa.k) ∧
(¬endpi,θ,cbaisa (d′.cpu)∨
endpi,θ,cbaisa (d
′.cpu) ∧ ¬matchstcur (c, d′)
)
readsMIPS (d, in) ∪ : isa(cmxa.k) ∧ endpi,θ,cbaisa (d′.cpu)∧
AstackMX (sbacur (c),msscur (c)) matchstcur (c, d
′)
)
In turn, the set of written addresses is the same as for the MXA machine:
writespi,θTh (c, in)
def≡ writespi,θMXA(cmxa, in)
Definition 8.43 (No Access to icm by the Machine for Kernel Threads). We combine both sets
into
accadpi,θTh (c, in, cba)
def≡ readspi,θTh (c, in, cba) ∪ writespi,θTh (c, in)
and define a predicate indicating that addresses icm are not accessed
noaccpi,θTh (c, in, icm, cba)
def≡ accadpi,θTh (c, in, cba) ∩ icm = ∅
8.3.5 IO- and OT -Points
The IO- and OT -points for our machine modeling kernel threads are similar to the ones for
the MXA machine except steps of the primitives different from thread scheduling and exit.
Since these primitives perform global operations, the machine executing such a primitive is
considered to be at IO- and OT -points.
Let the call of the primitive distinct from the thread switch and exit (that are scheduling
primitives) be indicated by the predicate
primnschpi,θTh (c)
def≡
(i) primpi,θTh (c)
(ii) stmtcil = call f(E)∧
f ∈ Fprimname \ {thread run, thread exit to}
Analogously, for the scheduling primitives we have
primschpi,θTh (c)
def≡
(i) primpi,θTh (c)
(ii) stmtcil = call f(E)∧
f ∈ {thread run, thread exit to}
242
8.3 Sequential Correctness in Concurrent Setting
Definition 8.44 (IO- and OT -Points for Kernel Threads). Then, for c ∈ CTh and in ∈ ΣMXA
we define
IOpi,θTh (c, in, cba)
def≡
{
1 : primnschpi,θTh (c)
IOpi,θMXA(cmxa, in, info, cba) : otherwise
where info is computed as before.
OT pi,θTh (c, in)
def≡
{
1 : primnschpi,θTh (c)
OT pi,θMXA(c, in) : otherwise
For the implementation machine in configuration cimp ∈ CMXA with an input in ∈ ΣMXA we
will directly use
IOpiimp ,θimpMXA (cimp , in, infoimp , cba) and OT piimp ,θimpMXA (cimp , in)
8.3.6 Requirements and Conditions for MXA Machine
The requirements and conditions on steps of the MXA machine implementing the kernel threads
are quite simple and are based on the definitions introduced before.
Since the MXA semantics already excludes non-suitable inputs for which the transitions are
not defined, we simply set
suitThMXA(in)
def≡ 1
As for the well-formedness of the MXA machine, we have ready defined it in Section 8.3.2 as
wfconf ThMXA(cimp , piimp , θimp , cba)
Finally, the well-behaviour of the MXA machine represents the software conditions from Sec-
tion 7.3.6 needed for the simulation of the extended mixed machine. Therefore, we consider
wbThMXA(cimp , in, piimp , cba)
def≡ scMXA(cimp , in, piimp , infoimp , θimp , cba,StIba)
8.3.7 Software Conditions for Kernel Threads
For the simulation between the machine for kernel threads and the MXA machine implementing
it, we still need to keep the static software conditions introduced in Definition 7.47 because they
represent the requirements on the placement of the compiled code and data into the memory.
Therefore, we restate here the same conditions with a slight modification.
Definition 8.45 (Static Software Conditions for Kernel Threads). For given pi, θimp , and cba,
we define
scstatTh (pi, θimp , cba)
def≡
(i) AcodeMX (infoimp , cba) ⊆ Acode
(ii) validcodeMX (infoimp , cba)
(iii) AgvarCIL (pi
imp
cil , θ) \AconstCIL (piimpcil , θ) ⊂ Adata
(iv) AconstCIL (pi
imp
cil , θ) = Aconst
(v) scprogMX (pi, θ)
Note, that (v) is the requirement only for the program of the hypervisor / OS kernel. It will be
used for proving the same property for the linked program in the presence of pith .
243
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
In comparison to the dynamic software conditions for the MXA machine in Definition 7.48,
the conditions for the kernel threads are not applied for the stack information regions that are
abstracted away in the semantics. Recall that the stack base pointer and the stack maximal size
are components of the kernel thread configuration. Moreover, we introduce restrictions on the
primitive calls, forbid accesses to the memory containing the implementation of the primitives,
TCBs, etc.
Definition 8.46 (Dynamic Software Conditions for Kernel Threads). Given a configuration
c ∈ CTh of the machine for kernel threads, an input in ∈ ΣMXA, and the existing step c′ ≡
δpi,θ,cbaTh (c, in), we state the dynamic software conditions requiring the following: (i) no run-
time error is generated by the step, (ii) the current thread does not access the memory regions
occupied by the compiled code of the framework, stacks of threads (except the case when the
stack abstraction is lost) and by the global variables of the framework pith , (iii) the current thread
does not modify the compiled code of the hypervisor / OS kernel program pi, (iv) during a
step of a primitive other than thread switch and exit no volatile accesses in the evaluation of
the primitive’s arguments are allowed, (v) evaluation of arguments for thread switch and exit
primitives may perform up to one volatile access, (vi) for any MX machine step (except the
primitive call) the dynamic MX software conditions hold, (vii) for any MIPS-86 step the software
conditions needed for the store buffer reduction hold, (viii) before a successful reconstruction of
an MX thread configuration, the masked interrupts must be masked by the programmer again,
and (ix) stacks of threads belong to the data region of the memory.
Let Aimp be the set of addresses at which the memory is not allowed to be accessed by the
thread according to (ii)
Aimp
def≡ (AcodeMX (infoimp , cba) \AcodeMX (info, cba)) ∪
Astacksproc (c.k) ∪Astacksnew (c) ∪Adatath (pith , θimp)
Then, using the shorthands d, d̂, în from Definition 8.42, and d, in, d
′
given below, we define
scdynTh (c, in, pi, θ, θimp , cba)
def≡
(i) c′ 6= ⊥
(ii) accadpi,θTh (c, in, cba) ∩Aimp = ∅
(iii) writespi,θTh (c, in) ∩AcodeMX (info, cba) = ∅
(iv) primnschpi,θTh (c) =⇒ nIOpicil,θCIL (ccil) = 0
(v) primschpi,θTh (c) =⇒ nIOpicil,θCIL (ccil) ≤ 1
(vi) mx(cmxa.k) ∧ ¬startpiisa(cmxa) ∧ ¬primpi,θTh (c) =⇒
scdynMX(cmx, in, pi, info, θ, cba, cmxa.k.sba, cmxa.k.mss)
(vii) mx(cmxa.k) ∧ startpiisa(cmxa) ∨ isa(cmxa.k) =⇒
scrMIPS(d, in)
(viii) isa(cmxa.k) ∧mx
(
conf curTMXA(c
′).k
)
=⇒
∀i ∈ {1, 7}. d′.cpu.core.spr(sr)[i] = 0
(ix) Astacksproc (c.k) ∪Astacksnew (c) ⊂ Adata
where the corresponding MIPS-86 configuration d and the input in are
(d, in) ≡
{
(d̂, în) : mx(cmxa.k) ∧ startpiisa(cmxa)
(d, in) : isa(cmxa.k)
244
8.3 Sequential Correctness in Concurrent Setting
and the next MIPS-86 configuration is computed as d
′ ≡ δrMIPS(d, in).
Definition 8.47 (Software Conditions for Kernel Threads). Combining both definition from
above, we get
scTh(c, in, pi, θ, θimp , cba)
def≡
(i) scstatTh (pi, θimp , cba)
(ii) scdynTh (c, in, pi, θ, θimp , cba)
8.3.8 Sequential Correctness for Kernel Threads in Concurrent Context
Finally, as it was done in the previous chapters, in order to state the sequential correctness
required for the justification of the concurrent model of kernel threads, we introduce auxiliary
definitions which are obtained from Section 7.3.7 by substituting the predicates for MXA by the
corresponding ones defined for the machine with kernel threads in this chapter.
For defined steps of the machine for kernel threads from a configuration c0 ∈ CTh till the next
consistency point, we require that the software conditions hold, the memory region icm is not
accessed, and the machine configuration at the next consistency point is well-formed.
SCseqTh(c0, pi, θ, θimp , cba, icm)
def≡
∀n ∈ N, c ∈ (CTh⊥)n+1 , λ ∈ (ΣMXA)n .
(i) c1 = c0 ∧
(
c1 −→nδpi,θ,cbaTh ,λ cn+1
)
(ii) ∀i ∈ [2 : n]. ci 6= ⊥ =⇒ ¬cpTh(ci, pi, θ, cba)
(iii) cn+1 6= ⊥ =⇒ cpTh(cn+1, pi, θ, cba)
=⇒
(i) ∀i ∈ Nn. scTh(ci, λi, pi, θ, θimp , cba)∧
noaccpi,θTh (ci, λi, icm, cba)
(ii) cn+1 6= ⊥ =⇒ wfconf pi,θ,cbaTh (cn+1)
The restrictions on the number of IO-points between two consistency points and the require-
ment on their proper implementation are formulated in a predicate for non-empty sequences
cimp ∈ (CMXA)∗, σ ∈ (ΣMXA)∗, with |cimp | = |σ| + 1, and c ∈ (CTh)∗, τ ∈ (ΣMXA)∗ with
|c| = |τ |+ 1:
oneIOThMXA(cimp , σ, c, τ, pi, θ, θimp , cba)
def≡
(i) ∀i, j ∈ N|τ |. IOpi,θTh (ci, τi, cba) ∧ IOpi,θTh (cj , τj , cba) =⇒ i = j
(ii) ∀i, j ∈ N|σ|. IOpiimp ,θimpMXA (cimp i, σi, infoimp , cba)∧
IOpiimp ,θimpMXA (cimpj , σj , infoimp , cba) =⇒ i = j
(iii)
(
∃i ∈ N|τ |. IOpi,θTh (ci, τi, cba)
)
=⇒
(
∃i ∈ N|σ|. IOpiimp ,θimpMXA (cimp i, σi, infoimp , cba)
)
(iv)
(
∃i ∈ N|τ |. OT pi,θTh (ci, τi)
)
⇐⇒
(
∃i ∈ N|σ|. OT piimp ,θimpMXA (cimp i, σi)
)
Additionally, the shorthands indicating that there are no consistency points in the given se-
quence are define as
nocpTh(c, pi, θ, cba)
def≡ ∀i ∈ N|c|. ¬cpTh(ci, pi, θ, cba)
245
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
nocpThMXA(cimp , pi, θimp , cba)
def≡ ∀i ∈ N|cimp |. ¬cpThMXA(cimp i, pi, θimp , cba)
Finally, we formally state the theorem for the sequential correctness for kernel threads in con-
current context in the way done before for lower layers of our model stack. Recall that in the
previous chapter we have presented the simulation between the reduced MIPS-86 and MXA
machines such that the existence of the MXA consistency points (see Definition 7.36) satisfy-
ing the requirements from Definitions 6.34, 7.42 is guaranteed. Therefore, in the correctness of
the kernel threads implementation we consider steps of the MXA machine with such consis-
tency points. For brevity, the condition on their existence is silently assumed in the simulation
theorem here.
Theorem 8.1 (Sequential Correctness for Kernel Threads in Concurrent Context).
∀pi ∈ ProgMX, cba ∈ B32 × B32, θ, θimp ∈ ParamsCIL,
c0 ∈ CTh , icm ∈ 2B32 , k ∈ N, cimp ∈ (CMXA)k+1 , ω ∈ (ΣMXA)k .
(i) wfconf pi,θ,cbaTh (c0) ∧ wfconf ThMXA(cimp [1], piimp , θimp , cba)
(ii) cpTh(c0, pi, θ, cba) ∧ cpThMXA(cimp [1], pi, θimp , cba)
(iii) icm ⊂ Ahyp \AcodeMX (info, cba) ∧ consisTh(c0, cimp [1], pi, θ, θimp , cba, icm)
(iv) cimp [1] −→k(
δ
piimp ,θimp ,ι
MXA ,ω
) cimp [k + 1]
(v) nocpThMXA(cimp [2 : k], pi, θimp , cba)
(vi) SCseqTh(c0, pi, θ, θimp , cba, icm)
=⇒
∃n ∈ N, c′imp ∈ (CMXA)n+1 , σ ∈ (ΣMXA)n ,m ∈ N, c ∈ (CTh)m+1 , τ ∈ (ΣMXA)m .
(i) n ≥ k ∧ c′imp [1 : k + 1] = cimp ∧ σ[1 : k] = ω
(ii) c′imp [1] −→n(
δ
piimp ,θimp ,ι
MXA ,ω
) c′imp [n+ 1]
(iii) cpThMXA(c
′
imp [n+ 1], pi, θimp , cba) ∧ nocpThMXA(c′imp [2 : n], pi, θimp , cba)
(iv) wfconf ThMXA(c′imp [n+ 1], piimp , θimp , cba)∧
∀i ∈ Nn.wbThMXA(c′imp [i], σi, piimp , cba)
(v) c1 = c0 ∧
(
c1 −→mδpi,θ,cbaTh ,τ cm+1
)
∧ wfconf pi,θ,cbaTh (cm+1)
(vi) cpTh(cm+1, pi, θ, cba) ∧ nocpTh(c[2 : m], pi, θ, cba)
(vii) oneIOThMXA(c′imp , σ, c, τ, pi, θ, θimp , cba)
(viii) consisTh(cm+1, c′imp [n+ 1], pi, θ, θimp , cba, icm)
with ι ≡ (cba,StIbas[pidcur (c0)]) and piimp ≡ link(pi, pith).
Proof: Here, we show in detail the correctness proof only for the thread creation and thread
switch primitives. The argumentation about other primitives is similar and left as a bookkeep-
ing exercise. In fact, all conditions introduced in this chapter are sufficient for showing the
correctness of the whole kernel threads implementation as well as the required property trans-
fer needed for the pervasive verification in the overall model stack considered in this work.
246
8.3 Sequential Correctness in Concurrent Setting
Thread creation: Let the machine for kernel threads be at the call of the primitive thread create
in the configuration c0. In turn, according to Definition 8.22 the MXA machine in the configura-
tion cimp [1] can be at one of the following consistency points:
1. the call of thread create:
The primitive is called with the same values of the function arguments in both machines
because the consistency relation consisTh(c0, cimp [1], pi, θ, θimp , cba, icm) holds.
According to the well-formedness wf runMXA in Definition 8.28 the MXA machine is at a C-
IL consistency point and, therefore, we have apθimp (cimp [1].M) 6= pid(cimp [1]). Since by
consisTh from Definition 8.41 we get c0.ap = apθimp (cimp [1].M) and by consisrunTh in Defi-
nition 8.35 we conclude pid(cimp [1]) = pidcur (c0) because the SPRs in both machines are
equal, we also have c0.ap 6= pidcur (c0). The further argumentation depends on the value
of c0.ap:
a) c0.ap /∈ {0, pidcur (c0)}: another processor holds the lock.
The machine for kernel threads performs a step from c1 = c0 and stays in the same
well-formed configuration, i.e, we have m = 1, δpi,θ,cbaTh (c1, in) = c2, and c1 = c2 = c0.
In turn, in the implementation if the computation cimp [1] −→k(
δ
piimp ,θimp ,ι
MXA ,ω
) cimp [k+1]
has not reached the next consistency point, we continue the steps till the configura-
tion c′imp [n + 1] with n > k where the MXA machine is at the second call of cas
in acquire lock and, therefore, cpThMXA(c
′
imp [n + 1], pi, θimp , cba) holds. Otherwise, we
have already n = k.
By a simple bookkeeping one can easily show that the configuration c′imp [n + 1] is
well-formed because we have only made operations on the stack of the MXA ma-
chine and moved to the next consistency point. Moreover, all performed MXA steps
are well-behaved, what can be seen from the executed code.
Since we did not change the configuration of the machine for kernel threads, and the
MXA machine configuration c′imp [n + 1] differs from cimp [1] by only two additional
frames for the functions thread create and acquire lock on the top of its stack, the
consistency relation consisrunTh is not violated and the overall consisTh(cm+1, c′imp [n+
1], pi, θ, θimp , cba, icm) holds.
It is also clear that the predicate oneIOThMXA(c′imp , σ, c, τ, pi, θ, θimp , cba) is satisfied be-
cause the call of the primitive in c1 is marked as IO-, OT -points, and the MXA ma-
chine has executed only a single call of cas.
b) c0.ap = 0: the lock is free.
The machine for kernel threads performs a step from c1 = c0 to c2 6= ⊥ (no run-
time error by the software conditions) where a new thread th with an ID 〈atid〉 ∈
dom (c1.free) (see the notation in the semantics of the step in Section 8.1.2.2) is created
with a stack not overlapping with stacks of other threads and wfthpi,θnew (th) holds.
c2.free = c1.free \ {〈atid〉} c2.ap = pidcur (c1)
c2.new(x) =
{
th : x = 〈atid〉
c1.new(x) : otherwise
One also concludes 〈atid〉 6= c1.k.ct because we have c1.k.ct ∈ dom (c1.k.ready) by
wfthproc in Definition 8.10 and dom (c1.k.ready) ∩ c1.free = ∅ by Definition 8.12 for
247
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
wfconf Th . Hence, it is easy to show that the step leads to the well-formed configura-
tion, i.e., wfconf pi,θ,cbaTh (c2) holds.
In the implementation the lock is also acquired by the processor pid(cimp [1]) and we
have to consider the next consistency point, which is, in contrast to the previous
case, the assignment ∗lock = 0 inside release lock . Again, we have either n = k, or
we continue the execution of the MXA machine till c′imp [n+ 1] with n > k if we have
not reached the consistency point in cimp [k + 1].
During these steps by the lock acquisition we get
apθimp (c′imp [n+ 1].M) = pid(c′imp [n+ 1])
By the call of search by tid we find a pointer to a node of the free thread with the
same ID 〈atid〉. This operation is feasible because we know 〈atid〉 ∈ dom (c1.free), by
Definition 8.37 of consis free/newTh we have c1.free = TIDs
θimp
free (cimp [1].M), and we have
not changed the memory cimp [1].M till the call of search by tid .
After the preparation of the well-formed stack and TCB for the new thread according
to wf newMXA in Definition 8.27, the thread is deleted from the list of free threads by the
function remove and inserted into the list of new threads by insert to end such that
at the end of the steps we have
TIDs
θimp
free (c
′
imp [n+ 1].M) = TIDsθimpfree (cimp [1].M) \ {〈atid〉}
TIDsθimpnew (c
′
imp [n+ 1].M) = TIDsθimpnew (cimp [1].M) ∪ {〈atid〉}
Taking into account the changes of the MXA machine configuration from above and
the fact that the steps have reached the consistency point indicated by cprelvolMXA (c
′
imp [n+
1].k) we conclude that the well-formedness wfconf ThMXA(c′imp [n + 1], piimp , θimp , cba)
holds. Moreover, as in the previous case, the steps are well-behaved.
Finally, since the stack of c′imp [n + 1] differs from the one in cimp [1] by two addi-
tional frames for the functions thread create and release lock and in c2 and c′imp [n+1]
we have made only the updates of the component considered above, using the sim-
ulation relation consisTh(c0, cimp [1], pi, θ, θimp , cba, icm) we conclude that the simu-
lation relation consisTh(cm+1, cimp [n + 1], pi, θ, θimp , cba, icm) after the steps in both
machines still holds.
As in the previous case, the invariant about the IO- andOT -points is also preserved.
2. the call of cas in acquire lock except for the first iteration:
In this case, the stack of the MXA machine in the configuration cimp [1] has already two
frames for the functions thread create and acquire lock on its top. Moreover, by the re-
lation consisrunTh (c0, cimp [1], pi, θimp , cba) the parameters of thread create on the stack are
coupled with the values of the corresponding arguments in the primitive call in the run-
ning thread. The further proof is the same as for case (1) except for the frames of the stack.
Namely, in case (a) the stack layout is preserved when we reach in c′imp [n+ 1] the next call
of cas in acquire lock , and in case (b) the topmost frame is substituted by the frame for the
function acquire lock .
3. the assignment ∗lock = 0 inside release lock :
Since cprelvolMXA (cimp [1].k) holds, by the well-formedness wf
run
MXA in Definition 8.28 we get
apθimp (cimp [1].M) = pid(cimp [1]) and by the same argumentation as in case (1) we con-
clude c0.ap = pidcur (c0).
248
8.3 Sequential Correctness in Concurrent Setting
Therefore, after a single step from c1 = c0 the machine for kernel threads has the configu-
ration
c2 = inc
cur
loc (c1)[ap := 0]
The configuration c2 is well-formed because no other components (except for ap and the
location in the topmost frame) have been changed.
As the next consistency point for the MXA machine we consider a C-IL statement directly
after the call of thread create in pi. Again, if in cimp [k + 1] we have not reached this con-
sistency point, we continue the execution of thread create until c′imp [n+ 1] after the return
from the primitive. Otherwise, we set n = k. After these steps the lock is released and we
get
apθimp (c′imp [n+ 1].M) = 0
The configuration c′imp [n + 1] is well-formed because we have only dropped two top-
most frames of the stack of the current thread that was well-formed in cimp [1]. The well-
behaviour trivially follows from the executed steps.
Since by consisrunTh from Definition 8.35 before the steps we had thcur (inc
cur
loc (c)) = c˜imp [1],
where c˜imp [1] is obtained from cimp [1] by deleting these two frames, and the MXA machine
has only released the lock and returned from thread create, we get thcur (c2) = c′imp [n+1].k
and conclude that consisTh(c2, c′imp [n+ 1], pi, θ, θimp , cba, icm) holds.
Obviously, the condition oneIOThMXA(c′imp , σ, c, τ, pi, θ, θimp , cba) is satisfied because a sin-
gle IO-/OT -operation is executed in both machines.
Thread switch: Both machines in the configurations c0 and cimp [1] are at the call of thread run
with the same (by consisTh ) identifier tidto of the thread we switch to (see the semantics of the
step in Section 8.1.2.2).
By the software conditions we know that any step from c0 does not generate the run-time
error and, therefore, tidto ∈ dom (c0.k.ready) holds. From Definition 8.36 of consisprocTh we have
dom (c0.k.ready) = TIDs
θimp
ready(cimp [1].M, pidcur (c0)). Hence, in the memory of the MXA ma-
chine the local list of threads ready for scheduling also contains a node of the thread tidto . More-
over, the current thread ID is equal in both machines: c0.k.ct = ctθimp (cimp [1].M, pidcur (c0)) =
tidfrom .
By performing a single step of the machine for kernel threads from c1 = c0 we compute its
new configuration c2. In turn, in the implementation, if cimp [k + 1] is a configuration of the
MXA machine not after the call of thread run , we continue its steps until c′imp [n + 1] similarly
to the proof of the thread creation. The result of the computation depends on whether we try to
switch to the same current thread, or to another thread that can be new or sleeping.
During the steps of the MXA machine we obtain pointers to the nodes of the same threads
tidto and tidfrom because the list of ready threads remains unchanged in the MXA memory since
the call of the primitive.
If both pointers are equal, the case corresponds to tidto = tidfrom and in the implementation we
return from thread run . The configuration c′imp [n+ 1] differs from c
′
imp [1] only by the increased
location of the topmost frame. The configuration c2 of the machine for kernel threads is com-
puted as c2 = inccurloc (c1). Since by Definition 8.35 of consis
run
Th we have thcur (c1) = c′imp [1].k and
no other components as well as the memory were changed, the simulation relation between c2
and c′imp [n+1] is preserved. The argumentation about the well-formedness of c2 and c
′
imp [n+1],
the well-behaviour of the MXA steps, and IO-/OT -points is trivial too.
If the pointers to the nodes of the threads being switched are not equal, the case corresponds
to tidto 6= tidfrom . The configuration c2 of the machine for kernel threads is obtained from
c1 by updating c2.k.ct = tidto , increasing the location in the topmost frame of the previously
249
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
running thread tidfrom , the transformation of its configuration to a configuration of the non-
running thread, and enriching the configuration of tidto by the state of the SPRs and TLB (see
Definition 8.8) from c1. Let the configurations of the threads in c1 and c2 be
thfrom ≡ thcur (c1) thto ≡ c1.k.ready(tidto)
th ′from ≡ c2.k.ready(tidfrom) th ′to ≡ thcur (c2)
Hence, in the configuration c2 we have
∀X ∈ {ac, ic}. th ′to .kmx.X = thto .X th ′to .kmx.spr = thfrom .kmx.spr
∀Y ∈ {sba,mss}. th ′to .Y = thto .Y th ′to .tlb = thfrom .tlb
In the implementation the MXA machine performs further MX steps where the states of
threads in their TCBs are changed, the current thread ID is also updated by tidto , and the MASM
procedure switch stack is called. Note that after this call the stack of the thread tidfrom has
frames for thread run and switch stack on its top. By starting the inline assembly we switch to a
MIPS-86 configuration containing a consistent physical stack of the thread tidfrom in the mem-
ory, the program counter pointing to the start of the inline assembly, the actual stack pointer and
frame base pointer, etc. After saving these pointers to the thread context of tidfrom , one can eas-
ily conclude that the well-formedness wf sleepMXA (see Definition 8.26) of the TCB and the stack for
the sleeping thread tidfrom is satisfied because one can again reconstruct a corresponding MX
thread configuration well-formed for this sleeping thread according to Definition 8.24. During
the execution of the next inline assembly instructions we restore the stack pointer and frame
base pointer and begin to operate on the stack of the thread tidto . After loading a word from
this stack by lw i1 bp 28 into the register i1 (used for passing the first argument during func-
tion/procedure calls according to the calling convention from Section 5.1.3), before the return
from switch stack we attempt to reconstruct an abstract MX thread configuration k′′mx ∈ KMX
because the MIPS-86 machine is at a consistency point corresponding to the start of the com-
piled code of ret. The further execution depends on the state of the thread tidto :
1. the thread tidto is sleeping:
In this case we know that before the steps wf sleepMXA(tidto , c
′
imp [1].M, piimp , θimp , cba) holds
and, therefore, for the sleeping threads there exists k′mx ∈ KMX satisfying wf sleepKMX (k′mx, θimp).
Then, by consissleepTh (thto , tidto , c
′
imp [1].M, pi, θimp , cba) for ni′ ≡ |k′mx.ic|, ic′top ≡ k′mx.ic[ni′]
we have
thto .ic = k
′
mx.ic[1 : ni
′ − 1] thto .ac = ictop [1 : top(ic′top)− 1] (8.2)
Since until the execution of ret in switch stack we have not modified the stack and fields of
TCB for the thread tidto except for its state, we successfully obtain the abstract MX thread
configuration k′′mx (mentioned above) and by the reconstruction conclude
k′′mx.ic = k
′
mx.ic (8.3)
The active context of k′′mx is of MASM type and contains a single stack frame for the proce-
dure switch stack , the current state of SPRs equal to the registers content in c′imp [1] because
we have not rewritten them, and the GPRs with i1 updated in the inline assembly. The
topmost frame of its last inactive context corresponds to the function thread run . The fur-
ther steps are performed in the MX semantics, namely, the MASM step for ret and the C-IL
250
8.3 Sequential Correctness in Concurrent Setting
return from thread run . Note that the register i1 is not used during these steps and, there-
fore, its loaded value is ignored. Formally, after these steps in the configuration c′imp [n+1]
we get for ni′′ ≡ |k′′mx.ic|, ic′′top ≡ k′′mx.ic[ni′′]:
c′imp [n+ 1].k.kmx.ic = k
′′
mx.ic[1 : ni
′′ − 1]
c′imp [n+ 1].k.kmx.ac = ictop [1 : top(ic
′′
top)− 1]
c′imp [n+ 1].k.kmx.spr = c
′
imp [1].k.kmx.spr
c′imp [n+ 1].k.tlb = c
′
imp [1].k.tlb
c′imp [n+ 1].k.sba = sba
θimp (c′imp [1].M, tidto)
c′imp [n+ 1].k.mss = mss
θimp (c′imp [1].M, tidto)
Note that along with k′′mx the values of sba and mss belonging to the sleeping thread are
also correctly reconstructed from the stack information abstraction instantiated in Sec-
tion 8.2.2 for the MXA machine. Since we do not change these fields in the TCB of the
thread, they are equal to the values fetched from c′imp [1].M.
Now, using equations (8.2) – (8.3), thfrom = c′imp [1].k from consis
run
Th (c1, c
′
imp [1], pi, θimp , cba),
and thto .sba = sbaθimp (c′imp [1].M, tidto), thto .mss = mssθimp (c′imp [1].M, tidto) from the
simulation relation consissleepTh (thto , tidto , c
′
imp [1].M, pi, θimp , cba) we easily obtain
c′imp [n+ 1].k.kmx.ic = thto .ic
c′imp [n+ 1].k.kmx.ac = thto .ac
c′imp [n+ 1].k.kmx.spr = thfrom .kmx.spr
c′imp [n+ 1].k.tlb = thfrom .tlb
c′imp [n+ 1].k.sba = thto .sba
c′imp [n+ 1].k.mss = thto .mss
From the computation of th ′to considered above we conclude
thcur (c2) = c
′
imp [n+ 1].k
Therefore, after the steps of both machines the simulation relation consisrunTh (c2, c′imp [n +
1], pi, θimp , cba) holds. It is also easy to see that we have consis
sleep
Th (th
′
from , tidfrom , c
′
imp [n+
1].M, pi, θimp , cba) because, as mentioned above, during the execution of the inline assem-
bly the well-formedness wf sleepMXA of the TCB and the stack for the thread tidfrom is satisfied
and these data structures in the MXA memory are preserved during the MXA steps until
c′imp [n+ 1]. Since in the machines we have updated only the components mentioned dur-
ing the proof, we finally conclude that the simulation relation consisTh between c2 and
c′imp [n + 1] holds. Moreover, c
′
imp [n + 1] is well-formed for the kernel threads implemen-
tation.
2. the thread tidto is new:
In this case before ret we cannot reconstruct k′′mx because the physical stack (see Figure 8.3)
of the new thread tidto satisfies wf newMXA(tidto , c′imp [1].M, θimp) and no corresponding ab-
stract MX machine stack consistent wrt. the MX compiler consistency exists.
Therefore, we continue MIPS-86 ISA steps executing the compiled code of ret. During
these steps, according to the implementation of ret given in Section 6.1.1.1, we know that
251
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
the topmost frame corresponding to the procedure switch stack is dropped and the pro-
gram counter is set to the return address stored in this frame.
Since by wf newMXA we know that this return address is equal to the address of the entry
function in the TCB, and by consisnewTh its is coupled with the function name of the stack
frame in the abstract configuration of the thread tidto , the MIPS-86 steps proceed with the
execution of the prologue (see Section 6.1.1.2) of this function until the next consistency
point, where the program counter points to the beginning of the compiled code of the
first statement in the body of the thread entry function. Moreover, we know that the
argument of the entry function resides in the register i1. Therefore, by reconstructing an
abstract MX thread configuration with the base address and maximal size of the stack
belonging to the thread tidto , we obtain the well-formed configuration c′imp [n + 1].k with
c′imp [n+ 1].k = thcur (c2).
By taking into account the argumentation about the thread tidfrom from the previous case,
we finally conclude that the simulation relation consisTh between c2 and c′imp [n+ 1] holds
also in case of the thread switch to the new thread.
In both cases from above the condition oneIOThMXA(c′imp , σ, c, τ, pi, θ, θimp , cba) is satisfied be-
cause we have not performed IO-operations. This finishes the correctness proof of the thread
switch considered in this thesis.
As for the usual MXA steps of the machine for kernel threads, one can easily show that both
machines execute the same steps. In case the stack substitution is performed by the kernel in an
inline assembly of the program pi, the software condition for the kernel threads guarantee that
the configurations of all threads except for the running one are preserved and no thread switch
is performed without the execution of the corresponding primitive.
8.4 Justification of the Concurrent Model
8.4.1 Cosmos Model Instantiations
Given a program pi = (piµ, picil) ∈ ProgMX of the hypervisor / OS kernel, the kernel threads
framework pith =
(
pithµ , pi
th
cil
) ∈ ProgMX such that the linked implementation program piimp ≡(
piimpµ , pi
imp
cil
)
is computed as piimp = link(pi, pith), the environment parameters θ, θimp satisfying
validparTh (θ, θimp), code base addresses cba = (cbacil, cbaµ) ∈ B32 ×B32, and cι ≡ (cba,StIbas), we
define the instantiations Spi,θ,θimp ,cbaTh , S
piimp ,θimp ,cι
imp ∈ S of the Cosmos models for the kernel threads
machine and the extended mixed machine implementing it on np ∈ N processors.
Concurrent Machine for Kernel Threads
• Spi,θ,θimp ,cbaTh .A – the set of memory addresses of the Cosmos model for kernel threads must
include not only byte addresses, but also the names new , free, ap of the abstract compo-
nents present in the kernel machine configuration. The reason is that all resources that
can be shared have to be modelled by the memory m : A → V in the concurrent machine
state. Moreover, in order to see which threads are owned or shared, we include Nmtid into
this set. Since we get configurations of threads from new , free, and processor’s threads,
for the Cosmos machine memory we will set ∀tid ∈ Nmtid. m(tid) = ⊥.
252
8.4 Justification of the Concurrent Model
Let the set of memory addresses not occupied by the compiled code and global variables
of the framework be denoted as
A′ ≡ B32 \ (AcodeMX (infoimp , cba) \AcodeMX (info, cba)) ∪Adatath (pith , θimp)
Then, we instantiate the set of addresses A as
S
pi,θ,θimp ,cba
Th .A = A′ ∪ {new , free, ap} ∪ Nmtid
where new , free, ap show the abstract component to be addressed.
• Spi,θ,θimp ,cbaTh .V – the set of Cosmos model memory values includes then all possible values
for the addressed components:
S
pi,θ,θimp ,cba
Th .V = B8 ∪ (Nmtid ⇀ KnrunTh ) ∪ 2Nmtid ∪ [0 : np] ∪ {⊥}
• Spi,θ,θimp ,cbaTh .R = AconstCIL (picil, θ) ∪Acode \
(
AcodeMX (infoimp , cba) \AcodeMX (info, cba)
)
• Spi,θ,θimp ,cbaTh .nu = np
• Spi,θ,θimpcbaTh .U = KprocTh ∪ {⊥}
• Spi,θ,θimp ,cbaTh .E = ΣMXA
Before we instantiate other components, we introduce the function
cTh(u,m)
def≡ c
composing the configuration c ∈ CTh of the sequential machine for the kernel threads on the
base of the unit’s configuration u and the Cosmos model memory m mapping either read-only
addresses Spi,θ,θimp ,cbaTh .R or all Spi,θ,θimp ,cbaTh .A to the corresponding values.
The result c depends whether the abstract components ap, new , free are in the given portion of
the Cosmos machine memory or not. In the latter case we choose any values of the components
because they are not involved into the underlying computations and are needed only for the
formalism used in definitions.
c.k = u c.M(a) =
{
m(a) : a ∈ dom (m)
B32 : otherwise
c.ap =
{
m(ap) : ap ∈ dom (m)
Nnp : otherwise
c.new =
{
m(new) : new ∈ dom (m)
 (Nmtid ⇀ KnrunTh ) : otherwise
c.free =
{
m(free) : free ∈ dom (m)
2Nmtid : otherwise
Therefore, using c = cTh(u,m) we can now apply the functions and predicates defined for
the kernel threads machine from the previous section.
• Spi,θ,θimp ,cbaTh .reads(u,m, in) =

readspi,θTh (c, in, cba) : u 6= ⊥ ∧ ¬primnschpi,θTh (c)
readspi,θTh (c, in, cba) ∪ {ap} : u 6= ⊥ ∧ primnschpi,θTh (c)∧
c.ap 6= 0
readspi,θTh (c, in, cba)∪ : u 6= ⊥ ∧ primnschpi,θTh (c)∧
{ap,new , free} a.ap = 0
∅ : otherwise
253
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Note that we do not include thread identifies into the reads-set because we do not modify
⊥ in m. For arguing about the ownership memory access policy it is sufficient to consider
new and free in this set.
• Spi,θ,θimp ,cbaTh .δ(u,m, in) – the transition function of the unit is instantiated wrt. the com-
putation c′ ≡ δpi,θ,cbaTh (c, in) of the next configuration of the sequential machine for kernel
threads. If the step is undefined, we obviously have c′ /∈ CTh⊥.
S
pi,θ,θimp ,cba
Th .δ(u,m, in) =

(c′.k,m′) : u 6= ⊥ ∧ ¬primnschpi,θTh (c) ∧ c′ 6= ⊥
(c′.k,m′′) : u 6= ⊥ ∧ primnschpi,θTh (c) ∧ c′ 6= ⊥∧
c.ap = pidcur (c)
(c′.k,m′′′) : u 6= ⊥ ∧ primnschpi,θTh (c) ∧ c′ 6= ⊥ ∧ c.ap = 0
(c′.k,m∅) : u 6= ⊥ ∧ primnschpi,θTh (c) ∧ c′ 6= ⊥∧
c.ap /∈ {0, pidcur (c)}
(⊥,m∅) : u 6= ⊥ ∧ c′ = ⊥
undefined : otherwise
where m′, m′′, and m′′′ are the updated portions of the Cosmos machine memory com-
puted as
m′ ≡ c′.M|writespi,θTh (c,in)
m′′(x) =
{
c′.ap : x = ap
undefined : otherwise
m′′′(x) =
{
c′.x : x ∈ {ap,new , free}
undefined : otherwise
and m∅ is the empty function satisfying dom (m∅) = ∅.
• Spi,θ,θimp ,cbaTh .IP(u,m, in) = (u 6= ⊥ =⇒ cpTh(u, pi, θ, cba))
• Spi,θ,θimp ,cbaTh .IO(u,m, in) =
(
u 6= ⊥ =⇒ IOpi,θTh (c, in, cba)
)
• Spi,θ,θimp ,cbaTh .OT (u,m, in) =
(
u 6= ⊥ =⇒ OT pi,θTh (c, in)
)
MXA Machine Implementing Concurrent Kernel Threads
The instantiation Spiimp ,θimp ,cιimp ∈ S of the Cosmos model with the mixed machine implementing
the kernel threads is almost equal to the MXA instantiation Spiimp ,θimp ,cιMX from Chapter 7 for piimp ,
θimp , and stack information abstraction obtained in Section 8.2.2. The only exception is the
interleaving points determined by Definition 8.22.
• For components X ∈ {A,V,R,nu,U , E , reads, δ, IO,OT } the instantiation is the same as
for the MXA machine
S
piimp ,θimp ,cι
imp .X = S
piimp ,θimp ,cι
MX .X
• Spiimp ,θimp ,cιimp .IP(u,m, in) =
(
u 6= ⊥ =⇒ cpThMXA(u, pi, θimp , cba)
)
254
8.4 Justification of the Concurrent Model
8.4.2 Sequential Simulation Theorem
Now, we instantiate the sequential simulation framework RSThSimp (pi, θ, θimp , cba) ∈ R for our Cos-
mos models Spi,θ,θimp ,cbaTh , S
piimp ,θimp ,cι
imp ∈ S. Since we do not have a specific simulation parameter,
we consider it equal to ⊥ and skip it in all predicates.
For configurations c˜ = (u,m), c ≡ cTh(u,m), c˜imp = (uimp ,mimp), a subset icm of the Cosmos
model memory, and icm′ ≡ icm \ ({new , free, ap} ∪ Nmtid) we define
RSThSimp (pi, θ, θimp , cba).

P = ⊥
sim(c˜imp , c˜, icm) = icm
′ ⊂ Ahyp \AcodeMX (info, cba)∧(
u 6= ⊥ =⇒
consisTh (c, (uimp .k,mimp), pi, θ, θimp , cba, icm
′)
)
CPa(u) = (u 6= ⊥ =⇒ cpTh(u, pi, θ, cba))
CPc(uimp) =
(
u 6= ⊥ =⇒ cpThMXA(uimp .k, pi, θimp , cba)
)
wfa(c˜) = u 6= ⊥ ∧ wfconf pi,θ,cbaTh (cTh(u,m))
wfc(c˜imp) = u 6= ⊥ ∧ wfconf ThMXA((uimp .k,mimp), piimp , θimp , cba)
suit(in) = 1
sc(c˜, in) = (u 6= ⊥ =⇒ scTh(c, in, pi, θ, θimp , cba))
wb(c˜imp , in) = wb
Th
MXA((uimp .k,mimp), in, piimp , cba)
Note that we use icm′ ∈ 2B32 because in the sequential correctness of the machine for kernel
threads (see Section 8.3.8) the corresponding predicates are defined for byte addresses of possi-
ble inconsistent memory regions.
Using Theorem 8.1, one can easily prove the generalized sequential simulation Theorem 2.3
for our case and, therefore, can state the correctness of the kernel thread implementation for
any hypervisor / OS kernel program and corresponding parameters.
Theorem 8.2 (Sequential Kernel Threads Correctness for Cosmos Model Simulation). The
generalized sequential simulation Theorem 2.3 holds for any Cosmos models Spi,θ,θimp ,cbaTh , S
piimp ,θimp ,cι
imp ∈
S and the simulation framework RSThSimp (pi, θ, θimp , cba) ∈ R instantiated wrt. any given hypervisor
/ OS kernel program pi = (piµ, picil) ∈ ProgMX with inline assembly, the environment parameters
θ, θimp ∈ ParamsCIL, system information ι ≡ (cba,StIba), the framework pith , and the number np ∈ N
of processors in the multi-core MIPS-86 machine.
8.4.3 Concurrent Model Simulation and Its Application Overview
As the last step for the concurrent model simulation between the machine for kernel threads
and the MXA machine implementing them, we instantiate the missing predicates.
As it was done in the previous chapter, we will use the predicate oginvTh(E) requiring for
E ∈ CSa with Sa ≡ Spi,θ,θimp ,cbaTh that the gust addresses are always shared and not owned. Its
formulation can be found in Definition 7.50 and we do not repeat it here.
Definition 8.48 (Ownership Invariant for Kernel Threads Machine). Given a configuration E
of the Cosmos machine for kernel threads, we require that (i) – (ii) threads belonging to a given
processor as well as their stacks are local, (iii) in case the components new and free are accessed
by a processor, ap is shared and owned by this processor whereas new , free, all threads present
255
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
in these components, and stacks of the new threads are local. (iv) Otherwise, these components
and addresses are shared and unowned.
Let cp ∈ CTh be computed as cp ≡ cTh(E.up, E.m). Then, we formally define
ocinvTh(E)
def≡
(i) ∀p ∈ Nnu. dom (E.up.ready) ∪ dom (E.up.fin) ⊂ E.Op \ E.S
(ii) ∀p ∈ Nnu. Astacksproc (E.up) ⊂ E.Op \ E.S
(iii) ∀p ∈ Nnu. E.m(ap) = p =⇒
(a) {new , free} ⊂ E.Op \ E.S ∧ ap ∈ E.Op ∩ E.S
(b) dom (E.m(new)) ∪ E.m(free) ⊂ E.Op \ E.S
(c) Astacksnew (cp) ⊂ E.Op \ E.S
(iv) E.m(ap) = 0 =⇒
(a) {ap,new , free} ⊂ E.S \⋃p∈Nnp E.Op
(b) dom (E.m(new)) ∪ E.m(free) ⊂ E.S \⋃p∈Nnp E.Op
(c) Astacksnew (cp) ⊂ E.S \
⋃
p∈Nnp E.Op
Therefore, the property PSa on configurations of the Cosmos machine for kernel threads is
specified as
PSa(E)
def≡ oginvTh(E) ∧ ocinvTh(E)
In the shared invariant we have to couple the ownership states of both machines and the
shared portions of their memories, as well as to indicate that such shared memories are well-
formed.
First, we introduce shorthands for the computation of addresses allocated for the data struc-
tures declared in the framework pith (see Section 8.2.1.1):
• pointers to the head and the tail of the global list of new threads:
Anew ≡
⋃
x∈{hd,tl}
{θimp .allocgvar (new x)}4
• pointers to the global list of free threads:
Afree ≡
⋃
x∈{hd,tl}
{θimp .allocgvar (free x)}4
• the spinlock:
Alock ≡ {θimp .allocgvar (lock)}4
• for a given processor p ∈ Nnp an element of the array of current threads:
Acur (p) ≡
{
ba
θimp
elem(current ,u32, p)
}
4
• for a given processor p ∈ Nnp pointers to the list of the ready threads:
Aready(p) ≡
⋃
x∈{hd,tl}
{
ba
θimp
elem(ready x,ptr(Thread), p)
}
4
256
8.4 Justification of the Concurrent Model
• for a given processor p ∈ Nnp pointers to the list of the finished threads:
Afin(p) ≡
⋃
x∈{hd,tl}
{
ba
θimp
elem(fin x,ptr(Thread), p)
}
4
• addresses occupied by an element of the type Thread for a given thread tid ∈ Nmtid in the
array threads :
Ath(tid) ≡
{
ba
θimp
elem(threads,Thread , tid)
}
sizeθimp (Thread)
• addresses occupied by elements Thread for threads with IDs from a setA ⊂ Spi,θ,θimp ,cbaTh .A:
Athreads(A) ≡
⋃
tid∈A∩Nmtid
Ath(tid)
Definition 8.49 (Shared Invariant for Concurrent Kernel Threads Simulation). Given parts
(m,S,R,O) and (mimp ,Simp ,Rimp ,Oimp) of configurations of the Cosmos machines for kernel
threads and their implementation, we define the shared invariant wrt. Definition 8.48. Using
the following shorthands
m˜ ≡ dmimpeB32
Aabs ≡ {new , free, ap} ∪ Nmtid
S ′ ≡ S \Aabs O′(p) ≡ O(p) \Aabs
S˜imp(S) ≡
{
Anew ∪Afree : {new , free} ⊂ S
∅ : otherwise
O˜imp(O, p) ≡
{
Alock ∪Anew ∪Afree : {ap,new , free} ⊂ O(p)
∅ : otherwise
and denoting the thread identifiers not existing in the system and memory addresses occupied
by array elements with such IDs 5
TIDs⊥(S,O) ≡ {0} ∪ Ntid \
S ∪ ⋃
p∈Nnp
O(p)

A⊥(S,O) ≡ Aready(0) ∪Afin(0) ∪Athreads (TIDs⊥)
5Recall from Section 8.2.1.1 that the arrays declared in pith have elements with indices 0 which are not used for simplic-
ity. Moreover, in the abstract model for kernel threads we do not require that all thread identifiers from Nmtid are
present, therefore, the array elements with such indices are not accessed in the implementation. This is guaranteed
by the software conditions assuming the absence of run-time errors generated in case of an operation on a thread
with a non-existing identifier.
257
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
we formally have
sinvThimp(pi, θ, θimp , cba)
(
(m,S,R,O), (mimp ,Simp ,Rimp ,Oimp)
) def≡
(i) Rimp = R∪
(
AcodeMX (infoimp , cba) \AcodeMX (info, cba)
)
(ii) Simp = S ′ ∪Alock ∪Athreads(S) ∪ S˜imp(S) ∪A⊥(S,O)
(iii) ∀p ∈ Nnu .
Oimp(p) = O′(p) ∪Acur (p) ∪Aready(p) ∪Afin(p) ∪Athreads(O(p)) ∪ O˜imp(O, p)
(iv) ∀a ∈ dom (m) \Aabs . m(a) = mimp(a)
(v) m(ap) = apθimp (m˜)
(vi) m(ap) = 0 =⇒
(a) m(free) = TIDsθimpfree (m˜)
(b) ∀t ∈ m(free). stateθimp (m˜, t) = TS FREE
(c) dom (m(new)) = TIDsθimpnew (m˜)
(d) ∀t ∈ dom (m(new)) . consisnewTh (m(new , t), t, m˜, pi, θ, θimp , cba)
(e) ∀t ∈ dom (m(new)) . wfthpi,θnew (m(new , t))∧
wf newMXA(t, m˜, θimp) ∧ stateθimp (m˜, t) = TS NEW
(f) valid stackTh (m(new))
(g) dom (m(new)) ∩ dom (m(free)) = ∅
Note that the addressesA⊥(S,O) not used for the implementation of the machine with kernel
threads have to be present in the ownership state for the preservation of the ownership invariant
in the MXA machine. Therefore, as one of solutions for this issue we set them shared and not
owned.
Moreover, (vi) in Definition 8.49 repeats for the shared components the corresponding parts
of the well-formedness for both machines and the simulation relation in case of the free lock.
Since the instantiated concurrent simulation relation and Definitions 8.48, 8.49 provide ev-
erything needed for the argumentation about the unit’s configuration of the MXA machine
implementing concurrent threads, we instantiate the unit invariant as
uinvThimp(pi, θ, θimp , cba)(up,Op,S)
def≡ 1
By setting Sa ≡ Spi,θ,θimp ,cbaTh and Sc ≡ Spiimp ,θimp ,cιimp we restate and prove here Assumptions 2.1–
2.4 needed for the concurrent simulation.
First, we show that the ownership invariant for the MXA machine follows from ocinvTh(E),
the ownership invariant of the machine for kernel threads, and the shared invariant.
Lemma 8.1 (Ownership Invariant for the MXA Machine Implementing Kernel Threads).
∀E ∈ CSa ,∀D ∈ CSc . oinv(E.G) ∧ ocinvTh(E) ∧ sinv(D,E) =⇒ oinv(D.G)
Proof: In order to prove the lemma, we recall from Definition 8.49 how the ownership states of
both machines in E and D are coupled and consider every component in the computation of
the ownership state of the MXA machine.
258
8.4 Justification of the Concurrent Model
The owned addresses of each unit p in the MXA machine contain the addresses E.Op \ Aabs ,
which are disjoint according to oinv(E.G) for the machine with threads:
∀p, q. p 6= q =⇒ E.Op ∩ E.Oq = ∅ (8.4)
Moreover, by the shared invariant any unit p owns only its element of the array of current
threads as well as the pointers to the lists of ready and finished threads. Since according to
ocinvTh(E) and (8.4) the abstract components ap, new , and free can also be owned only by a
single unit in the machine, this holds also for the lock and the pointers to the lists of free and
new threads in the implementation. By (8.4) the sets of owned thread identifiers are disjoint for
p 6= q. Moreover, in the implementation each thread is represented by a separate element of type
Thread in the array threads . Hence, Athreads(D.Op) 6= Athreads(D.Oq) holds and we conclude
the first part of oinv(D.G): ∀p, q. p 6= q =⇒ D.Op ∩D.Oq = ∅.
By the shared invariant the owned and shared addresses in the implementation do not con-
tain the code region. Moreover, in the framework pith we do not have variables declared as
constants. All constants present in the linked program come from the program pi of the hyper-
visor / OS kernel and, therefore, belong to the read-only addresses of the machine for kernel
threads. Since these addresses are unowned and not shared according to oinv(E.G), we obtain
the second part of oinv(D.G):
∀p. D.Op ∩ Sc.R = ∅ ∧ D.S ∩ Sc.R = ∅
As follows from the instantiations of both machines, their address spaces differ only by the
addresses occupied by the compiled code and data structures of the framework pith . According
to the shared invariant, the read only memory of the MXA machine contains this compiled
code and all data structures are mentioned in the computation of the ownership state in the
implementation. Therefore, the last part of oinv(D.G) holds:
Sc.A = Sc.R∪D.S ∪
⋃
p∈Nnu
D.Op
We discharge Assumption 2.1 by proving the following lemma.
Lemma 8.2 (Safety Transfer and Invariants Preservation for Correctness of Concurrent Kernel
Threads).
∀D ∈ CSc , d′ ∈MSc , E,E′ ∈ CSa , σ ∈ Θ∗Sc , τ ∈ Θ∗Sa , oτ ∈ Ω∗Sa , p ∈ Nnu.
(i) D.M σ7−→ d′ ∧ blk(σ, p) ∧ oneIO(σ, τ) ∧ wb(D.M, σ) ∧ wfp(d′)
(ii) E
〈τ,oτ 〉7−→ E′ ∧ blk(τ, p) ∧ PSa(E) ∧ safePSa (E, 〈τ, oτ 〉) ∧ sc(E.M, τ) ∧ wfp(E′.M)
(iii) csimp(D.M,E) ∧ sinv(D,E) ∧ csimp(d′, E′)
=⇒
∃oσ ∈ (ΩSc)∗,G′ ∈ GSc .
(i) D
〈σ,oσ〉7−→ (d′,G′) ∧ safe(D, 〈σ, oσ〉)
(ii) sinv((d′,G′), E′)
259
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Proof: The safety safe(D, 〈σ, oσ〉) requires that the ownership invariant oinv(D.G) holds, what
follows directly from Lemma 8.1.
In order to prove the ownership safety during the MXA steps and the shared invariant after
them, one should consider cases depending on the step of the abstract machine and the consis-
tency point in the implementation in the way similar to the proof of Theorem 8.1.
Thread switch: During the thread switch both machines operate on local data structures and
no ownership transfer is performed. Therefore, we choose the ownership transfer information
oσ containing empty sets of addresses for each step of the MXA machine.
From ocinvTh(E) and the shared invariant we know that all threads from dom (E.up.ready)
and their elements in the array threads are locally owned. Therefore, any access to a TCB by
such a thread identifier is safe. Moreover, the index of the current thread to be rewritten resides
in the memory at the addresses Acur (p), which are local according to the shared invariant.
Since from consisprocTh we have dom (E.up.ready) = TIDs
θimp
ready(D.m, p), we conclude that the
local list of the ready threads in the implementation contains only locally owned nodes of all
threads from dom (E.up.ready). Therefore, the function search by tid traversing this list accesses
local addresses. During the thread reconstruction after the execution of the inline assembly in
the procedure switch stack we access the stack information abstraction (in the memory) also
instantiated by the list of these ready threads.
Moreover, any operations on the stacks of the threads to be switched are safe because the
stacks of all processor’s threads are allocated at the memory addresses owned and not shared
according to ocinvTh(E).
Since the ownership state and the shared memory are not changed during the steps in both
machines, the shared invariant is trivially preserved.
The proof of the lemma for the primitive therad exit to is the same except for the additional
argumentation about the locally owned list of finished threads.
Pure MXA step of the abstract machine: Since both machines perform the same steps, we
choose oσ with the ownership transfer applied for the abstract machine.
The safety of the memory accesses and the ownership transfer in the implementation machine
then mostly follows from PSa(E) ∧ safePSa (E, 〈τ, oτ 〉). The only exception is the context switch
back to the current thread performed in the abstract machine. In the implementation during the
reconstruction we access the stack information abstraction. Hence, the argumentation about the
safety of the memory accesses is the same as in the case for the thread switch above.
The shared invariant holds after the steps because we have csimp(d′, E′), well-formedness
wfp(d
′), wfp(E
′.M), and the ownership states in both machines are changes in the same way.
Thread creation: During the thread creation by the unit p the ownership state of both machines
is changed when we acquire and release the lock. The lemma is proven by a case split on
E.m(ap).
1. E.m(ap) /∈ {0, p}: the lock is held by another processor.
After reading only E.m(ap) the abstract machine stays in the same configuration and the
ownership state remains unchanged. In the implementation the machine either calls the
primitive and executes its code until the second call of cas or performs a single iteration
in the loop of the function acquire lock. Therefore, we set oσ with empty sets for all steps
of the MXA machine.
From ocinvTh(E), oinv(E.G), and the shared invariant we conclude that the addresses
Alock are shared and not owned by p. Since the MXA machine reads the memory at these
addresses at an IO-point, the memory access policy is not violated. If the primitive is
260
8.4 Justification of the Concurrent Model
called in the implementation, the expression evaluation of the arguments is also safe be-
cause the same memory addresses are accessed in both machines and safePSa (E, 〈τ, oτ 〉)
holds.
Moreover, the shared invariant after the steps follows from the fact that no ownership
state and no shared memory are changed.
2. E.m(ap) = 0: the lock is free.
In this case according to ocinvTh(E) we know that ap, new , free, all threads fromE.m(free),
dom (E.m(new)), and their stack addresses Astacksnew (Th(E.up, E.m)) are shared and not
owned. Not violating the ownership safety of the step, the machine for kernel threads
acquires them and makes them local except for the shared and owned ap. After updating
these components and setting E.m(ap) = p, the ownership is no more changed.
Using ocinvTh(E), the shared invariant for E and D, and the simulation relation we also
conclude that the memory addresses occupied by the lock, the pointers to the lists of free
and new threads, the array elements corresponding to these threads as well as stacks of
new threads are shared and not owned by any unit. Moreover, the lists of new and free
threads contain only such array elements.
Therefore, we choose oσ in a way such that during the execution of cas in the function
acquire lock all theses addresses are acquired and made local except for the addresses
of the lock. For all other steps the ownership transfer information has empty sets and
does not change the ownership state of the MXA machine. Directly from this setting we
conclude the ownership safety of the MXA steps implementing the step of the thread in
this case.
After the lock acquisition all memory accesses during the traversal of the lists of new and
free threads, the accesses to the TCB of the new thread (found in the list) and its stack are
safe.
From sinv(D,E), csimp(d′, E′), and the updates of the ownership states in both machines
one easily concludes that the ownership invariant is preserved after the steps.
3. E.m(ap) = p: the lock is held by processor p.
According to ocinvTh(E) we know that new , free, all threads from E.m(free), new threads
from dom (E.m(new)), and their stack addresses Astacksnew (Th(E.up, E.m)) are shared and
not owned by any unit. The components ap is shared and owned by p. During the step of
the abstract machine these components are released and ap is set to E.m(ap) = 0.
By ocinvTh(E), csimp(D.M,E), and sinv(D,E) one easily sees that in the implementation
the memory addresses occupied by the aforementioned abstract components belong to
the sets of owned and shared addresses of the MXA machine in the same way as it is
mentioned above for the machine with threads.
By executing the assignment ∗lock = 0 inside release lock we release all these addresses.
Obviously, the lock write is a safe memory access because it is owned by unit p. The
ownership transfer information oσ contains the addresses to be released by the step per-
forming ∗lock = 0 and empty sets for all other MXA steps. Therefore, the ownership
transfer policy is not violated.
One can easily show that sinv((d′,G′), E′) holds by sinv(D,E), csimp(d′, E′), the updates
of the ownership states, and well-formedness wfp(d
′), wfp(E
′.M) for unit p after its steps.
The proof of Assumption 2.4 is shown in Lemma 8.3.
261
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Lemma 8.3 (Preservation of Simulation Relation for Concurrent Kernel Therads).
∀D,D′ ∈ CSc , E,E′ ∈ CSa , p ∈ Nnu.
(i) csimp(D.M,E)
(ii) sinv(D,E) ∧ PSa(E) ∧ oinv(E) ∧ oinv(D)
(iii) sinv(D′, E′) ∧ PSa(E′) ∧ oinv(E′) ∧ oinv(D′)
(iv) E ≈p E′ ∧D ≈p D′
=⇒ csimp(D′.M,E′)
Proof: First, we consider the preservation of parts of consisTh not depending on the value of
the lock. From E ≈p E′ and D ≈p D′ we know that in both machines the configurations of unit
p, the memory owned by p, and its sets of owned addresses are equal. Since csimp(D.M,E)
holds and the consistency for the shared memory as well as E′.m(ap) is covered by sinv(D′, E′)
we conclude that for D′ and E′ the simulation relations consisprocTh ,consis
mem
Th , and the equality
for the lock are preserved.
In order to prove consis free/newTh betweenD
′ andE′, we make a case split on values ofE.m(ap)
and E′.m(ap).
1. E.m(ap) = 0 ∧ E′.m(ap) /∈ {0, p}: The components free and new are locked by another
unit q 6= p. Therefore, the unit p does not cover them in its simulation relation and
consis
free/new
Th holds by definition (the implication in the relation is true).
2. E.m(ap) = 0 ∧ E′.m(ap) = 0: The components free, new are not locked in E′. By
ocinvTh(E) they belong to the shared memory of the Cosmos machine and are covered
by the shared invariant in the same way stated in consis free/newTh .
3. E.m(ap) = 0 ∧ E′.m(ap) = p: By ocinvTh(E) and ocinvTh(E′) we obtain ap /∈ E.Op
and ap ∈ E′.Op. Since the owned addresses of p are changed, it contradicts the premise
E ≈p E′ requiring the equality of the owned addresses between E and E′. Therefore, the
preservation of the simulation relation does not have to be proven.
4. E.m(ap) = p ∧E′.m(ap) /∈ {0, p}: By ocinvTh(E), ocinvTh(E′), and oinv(E′) we conclude
ap ∈ E.Op and ap /∈ E′.Op and become a contradiction as in the previous case.
5. E.m(ap) = p ∧ E′.m(ap) = 0: Again, in this case we get a contradiction to E ≈p E′,
because the owned sets with ap ∈ E.Op and ap /∈ E′.Op are not equal.
6. E.m(ap) = p ∧ E′.m(ap) = p: In both configurations E, E′ the components ap, new , free,
the identifiers of new and free threads, and addresses occupied by their stacks are owned
by p. The same holds for the corresponding addresses in the implementation machine. By
E.m|E.Op = E′.m|E.Op we concludeE.m(free) = E′.m(free) andE.m(new) = E′.m(new).
Moreover, we also haveD.m|E.Op = D′.m|E.Op . Since consis free/newTh holds betweenE and
D, it is also preserved in E′ and D′.
7. E.m(ap) /∈ {0, p}∧E′.m(ap) /∈ {0, p}: The unit p does not to cover the new and free thread
in its simulation relation (the implication in consis free/newTh is true).
8. E.m(ap) /∈ {0, p} ∧ E′.m(ap) = 0: In this case, ap, free, new are shared in E′ and the
relation consis free/newTh trivially follows from sinv(D
′, E′).
262
8.4 Justification of the Concurrent Model
9. E.m(ap) /∈ {0, p} ∧ E′.m(ap) = p: The component ap is not owned by p in E. However, it
is owned in E′. This contradicts again the premise E ≈p E′.
For the proof of Assumptions 2.2 and 2.3 about the well-formedness preservation in both
machines we now consider Lemma 8.4 and Lemma 8.5.
Lemma 8.4 (Preservation of Well-Formedness for Other Units in Machine with Kernel Threads).
∀E,E′ ∈ CSa , p ∈ Nnu.
(i) wf p(E.M) ∧ E ≈p E′
(ii) sinv(D′, E′)
(iii) oinv(E) ∧ PSa(E) ∧ oinv(E′) ∧ PSa(E′)
=⇒ wf p(E′.M)
Proof: Since the configuration of unit p in E and E′ are equal by E ≈p E′, we conclude that
wfthpi,θproc and wfconf
pi,θ,cba
MXA in well-formedness wfconf
pi,θ,cba
Th are preserved. For the rest depend-
ing on the value of E′.m(ap) we consider the same case split from Lemma 8.3:
1. E′.m(ap) /∈ {0, p}: the rest of well-formedness holds by definition because it not covered
if the components ap, new , free are locked by some other unit q 6= p.
2. E.m(ap) = 0 ∧ E′.m(ap) = 0: From the shared invariant we get
∀t ∈ dom (E′.m(new)) . wfthpi,θnew (E′.m(new , t))
dom (E′.m(new)) ∩ E′.m(free) = ∅
valid stackTh (E
′.m(new))
By ocinvTh(E′) we know that the identifiers of the free and new threads as well as their
stack addresses are shared and unowned. Moreover, IDs of ready, finished threads, and
their stacks are locally owned by p. Therefore, we get that the sets dom (E′.m(new)),
E′.m(free), and dom (E′.up.ready) ∪ dom (E′.up.fin) do not intersect pairwise.
Since the unit configuration is equal in E, E′ and by wf p(E.M) the sets of the ready and
finished threads are disjoint, we conclude that the sets of new, free, ready, and finished
threads in E′ are disjoint too. By the same argumentation for the stack addresses one
shows that the stacks of the ready, finished and new threads do not overlap.
3. E.m(ap) = p ∧ E′.m(ap) = p: In both configurations E, E′ the components ap, new ,
free, the identifiers of the new and free threads and addresses occupied by their stacks
are owned by p. By E.m|E.Op = E′.m|E.Op we conclude E.m(free) = E′.m(free) and
E.m(new) = E′.m(new). The same equality holds also for the finished and ready threads.
Therefore, wf p(E′.M) follows directly from wf p(E.M).
4. E.m(ap) /∈ {0, p} ∧ E′.m(ap) = 0: The proof is absolutely the same as for the case with
E.m(ap) = 0 ∧ E′.m(ap) = 0.
As shown in Lemma 8.3, all other cases contradict E ≈p E′.
263
8 Concurrent Kernel Threads: Model, Implementation, and Correctness Criteria
Lemma 8.5 (Preservation of Well-Formedness for Other Units in Machine Implementing Ker-
nel Threads).
∀D,D′ ∈ CSc , E,E′ ∈ CSc , p ∈ Nnu.
(i) wf p(D.M) ∧D ≈p D′
(ii) csimp(D.M,E) ∧ sinv(D,E) ∧ sinv(D′, E′)
(iii) oinv(E) ∧ PSa(E) ∧ oinv(E′) ∧ PSa(E′)
=⇒ wf p(D′.M)
Proof: From ocinvTh(E) and the shared invariant sinv(D,E) we know that all threads from the
domains dom (E.up.ready) ∪ dom (E.up.fin), their stack addresses, and their elements in the ar-
ray threads are locally owned. Since by consisprocTh we have dom (E.up.ready) = TIDs
θimp
ready(D.m, p)
and dom (E.up.fin) = TIDs
θimp
fin (D.m, p), we conclude that the local lists of ready and finished
threads in the implementation contain only locally owned nodes of all threads from the afore-
mentioned domains. Moreover, according to sinv(D,E) the addresses Acur (p) ∪ Aready(p) ∪
Afin(p) are also local.
Therefore, using the equality of unit configurations for p, its owned addresses, owned mem-
ory assumed by D ≈p D′, and wf procMXA in D, we easily conclude that wf procMXA for p in D′ is
preserved.
The proof of wf free/newMXA in D
′ depends on apθimp (D′.m), apθimp (D.m) equal to E′.m(ap) and
E.m(ap) respectively by sinv(D′, E′) and sinv(D,E). Hence, we make a case split from the
proof of Lemma 8.3:
1. apθimp (D′.m) /∈ {0, p}: Well-formedness wf free/newMXA trivially holds by definition.
2. apθimp (D′.m) = 0: Well-formedness wf free/newMXA is fully covered by sinv(D
′, E′).
3. apθimp (D.m) = p ∧ apθimp (D′.m) = p: Analogously to the proof for ready and finished
threads, we use ocinvTh(E), sinv(D,E), and consis
free/new
Th in order to show that the cor-
responding lists in the implementation contain only addresses locally owned by p. More-
over, the stacks of new threads as well as the addresses Anew ∪Afree are owned by p too.
Therefore, using D ≈p D′ and wf free/newMXA in D for p, we conclude that for unit p well-
formedness of the MXA machine for new and free threads is preserved in D′.
As follows from premise (iii), all other cases contradict D ≈p D′ because the owned ad-
dresses of p inD andD′ are not equal. For instance, in case of apθimp (D.m) = p∧apθimp (D′.m) = 0
using ocinvTh(E) and ocinvTh(E′) we conclude that ap, free, new are owned by p in E and not
owned in E. By the shared invariant in D and D′ the same holds for the corresponding address
occupied by these components in the implementation. Therefore, we conclude D.Op 6= D′.Op.
This finishes the proof of the well-formedness transfer in the MXA machine implementing
kernel threads.
Having discharged the assumptions for all instantiations one can state that Cosmos model
simulation theorem holds for each case.
264
8.4 Justification of the Concurrent Model
Theorem 8.3 (Cosmos Model Simulation Theorem for all Programs of Hypervisor / OS Ker-
nels Using Threads). Theorem 2.4 holds for any kernel programs pi with inline assembly, the envi-
ronment parameters θ, θimp ∈ ParamsCIL, and the system information cι ≡ (cba,StIbas) used for
instantiation of the models Spi,θ,θimp ,cbaTh , S
piimp ,θimp ,cι
imp ∈ S wrt. the framework pith .
Finally, in order to justify the concurrent model of kernel threads executed on the multi-core
MIPS-86 machine, one has to prove additionally the following:
• the safety and properties transfer from incomplete consistency blocks to the arbitrary
schedules of Spiimp ,θimp ,cιimp in the way done in the previous chapters,
• a simple simulation guaranteeing that for any steps of Spiimp ,θimp ,cιMXA there exist the same
steps of Spiimp ,θimp ,cιimp such that the machine configurations are coupled and the interleaving
points of Spiimp ,θimp ,cιimp are in the set of interleaving points of the underlying machine,
• the transfer of the safety and all other properties from Spiimp ,θimp ,cιimp to the complete consis-
tency block schedules of Spiimp ,θimp ,cιMXA required by the concurrent simulation theorem from
Chapter 7.
Then, applying Theorem 7.3 we get complete consistency block schedules of Spiimp ,θimp ,cιMXA for
which we prove the existence of steps of the Cosmos machine for concurrent kernel threads in
the following way:
• transform a block schedule to a corresponding IP-schedule,
• switch to the corresponding schedule of Spiimp ,θimp ,cιimp via the simulation introduced above,
• perform the order reduction so that one gets the consistency blocks suitable for the con-
current threads simulation6,
• apply Theorem 8.3 for the reordered sequence.
We leave this technical tasks for an interested reader and do not consider them formally in
this thesis.
6Recall that this is the second application of the order reduction in our model stack. The first one was made on the
steps of the SB reduced MIPS-86 for obtaining the consistency blocks implementing steps of the MXA machine.
265

9 Conclusion and Future Work
To the best of our knowledge, the results provided in this doctoral thesis contribute to a number
of important research topics concerning the formal verification of hypervisors and operating
systems.
• The work represents progress towards modeling and verification of multi-threaded ker-
nels of hypervisors and OS kernels as well as formalization of C semantics wrt. the C11
standard dealing with threads. Note that the Verisoft [Ver10] and VerisoftXT [The11]
projects considered the introduction of threads as a topic for future work (see Chapter
7 in [Do¨10]). Moreover, Kovalev assuming in [Kov13] the multi-threaded hypervisor ker-
nel implementation, claimed that “probably the most complicated part of the kernel layer
verification is the proof of a thread switch mechanism”.
• In the thesis, we give the first argumentation about the correctness of threads imple-
mented by the mixture of high- and low-level programming languages with an optimiz-
ing to some degree compiler. In contrast to existing work [FS05, FSV+06, FSDG08, NYS07,
GFSS12], where the thread switch is similar to the normal context switch saving/restoring
the content of the whole register file, having stated the compiler correctness and relying
on it we managed to implement the correct switch with much less operations required
from the system programmer.
• In the attempt to formalize the multi-threaded programming model and to find a clean
implementation of threads simple enough for reasoning about their formal correctness,
we discovered that classical forking typical for Lunix kernel programming has an unnec-
essarily complicated semantics and operates rather on processes than threads. As a result,
we came up with a simpler implementation that turned to be similar to the threads given
by the POSIX standard. Based on that and using the achievements of the Verisoft project
concerning the CVM model, where special functions on the model are formalized as prim-
itives, we managed to provide a clean concurrent semantics of kernel threads revealing
hardware features (except for external interrupts) needed for system programming. The
integration of inter-processor interrupts into the concurrent pure C-IL semantics can be
found in [Pen16]. Moreover, the C-dialect C0 with inline assembly for systems containing
disks and allowing external interrupts is considered in [PBLS15].
• In order to argue about the correctness of our threads implementation, we developed the
extended concurrent C-IL+MASM+Asm semantics that allows to describe any stack sub-
stitution not possible in [Sha12] and, in fact, is powerful enough for the implementation of
hypervisor/OS kernels, however, restricted to a case without device steps and interrupts
appearing during C-IL steps.
• In comparison to [Kov13, Pen16] where the authors abstracted away from some impor-
tant technical issues, our work contains a complete model stack where all layers are con-
nected by the same general concurrent simulation theory, the compiler correctness is fully
267
9 Conclusion and Future Work
exposed, and all (hopefully) needed well-formedness requirements, software conditions,
and safety properties are discovered.
Among the directions for future work one can point to:
• the formal mechanized proof of theorems stated in the thesis with the full check of prop-
erties transfer and auxiliary technical, though simple, simulations (e.g., between S′pi,θ,ξrMIPS
and SrMIPS in Section 6.2.4) only sketched in the scope of the work,
• the implementation and verification of the kernel memory manager responsible for the
memory allocation,
• the implementation of the threads manager based on the introduced model of kernel
threads,
• the integration of interrupts and preemption in the thread scheduling via combination of
the results from [Pen16] and [FSDG08, GFSS12] achieved by the FLINT group,
• the creation of initial threads in the kernel for each processor as well inside any user pro-
cess. Talking about the initialization of the kernel on the multi-core machine, the initial
stacks are usually allocated for each core during the booting process and then associated
with such boot-strap threads. This association can be performed by the implementation
of a function preparing a TCB corresponding to a calling thread.
• relaxing the restrictions and limitations of the Cosmos model and order reduction applied
as the main theory in the scope of our work. For instance, one of such strong restric-
tions is the fixed set of read-only memory addresses excluding the self modification of the
code. Another one is the dependence of the IO- and IP-points only on this memory. The
discussion on the solutions for these and other issues of the Cosmos model can be found
in [Bau14b]. The development of a more flexible order reduction is in progress at the chair
of Prof. Paul in Saarland University.
• the application of Oberhauser’s [Obe16] effective programming discipline for the store
buffer reduction and substituting our safety policy by the new required conditions.
• Last but not least, one has to mention that the introduction of the multi-threaded seman-
tics opens a new direction for the work on the CVM model for the multi-core machine.
Even though one could implement CVM with a number of kernel threads equal to the
number of processor cores (what probably does not need the full model from Chapter 8),
each user process must be represented by a few virtual processors instead of a single
one considered in [PBLS15] because the execution of a user process can now be split into
concurrent threads. Obviously, having such a model of the user process with concurrent
virtual processors, one can easily apply our semantics of threads for the verification of
user programs.
268
Appendix A: Implementation of Spinlocks
with Processor ID
Listing 9.1: Getting processor ID.
get_pid USES sv1 sv2 sv3 sv4 sv5 sv6 sv7 sv8
1: movs2g pid rv
2: ret
Listing 9.2: Spinlock type.
typedef volatile i32 lock_t;
Listing 9.3: Spinlock initialization.
void init_lock(lock_t *lock)
{
1: *lock = 0;
2: return;
}
Listing 9.4: Acquisition of spinlock
void acquire_lock(lock_t *lock)
{
u32 pid;
i32 acquired;
u32 i;
1: i = 1; // the first cas will be executed
2: pid = call get_pid();
3: call cas(lock, 0, (i32)pid, &acquired);
4: i = 0; // the first cas is executed
5: ifnot (acquired == 0) goto 3;
6: return;
}
Listing 9.5: Release of spinlock
void release_lock(lock_t *lock)
{
1: *lock = 0;
2: return;
}
269

Appendix B: Implementation of Operations
on Doubly Linked Lists of Threads
Listing 9.6: Search for a list node of a thread with a given ID.
thread_t *search_by_tid(thread_t *hd, u32 tid)
{
1: ifnot (hd != 0 && (hd->tcb).tid != tid) goto 4;
2: hd = hd->next;
3: goto while;
4: return hd;
}
Listing 9.7: Search for a list node of a thread with a given processor ID.
thread_t *search_by_pid(thread_t *hd, u32 pid)
{
1: ifnot (hd != 0 && (hd->tcb).pid != pid) goto 4;
2: hd = hd->next;
3: goto while;
4: return hd;
}
Listing 9.8: Removing a thread list node.
void remove(thread_t **hd, thread_t **tl, thread_t *th)
{
1: ifnot (th->prev = 0) goto 4;
2: *hd = th->next;
3: goto 5;
4: (th->prev)->next = th->next;
5: ifnot (th->next = 0) goto 7;
5: *tl = th->prev;
6: goto 8;
7: (th->next)->prev = th->prev;
8: return;
}
Listing 9.9: Inserting a node to the end of a thread list.
void insert_to_end(thread_t **hd, thread_t **tl, thread_t *th)
{
1: ifnot (*tl == 0) goto 6;
2: *tl = th;
3: *hd = th;
4: th->prev = 0;
5: th->nex = 0;
5: goto 10;
6: th->next = (*tl)->next;
7: th->prev = *tl;
8: (*tl)->next = th;
9: *tl = th;
10: return;
}
271

Appendix C: Long List of “Small” Mistakes
in PhD Dissertations / Books
Shadrin’s PhD Thesis [Sha12]
1. Definition of the inactive C-IL context of the MX machine (Definition 5.5 on page 71):
The callee-save registers in the inactive C-IL context are not needed for the operational
semantics of the MX machine.
2. Computation of the C-IL configuration from the MX machine (Definition 5.7 on page 73, Pure
C-IL step on page 74):
The C-IL configuration obtained from the MX machine does not take into account the
frames of inactive contexts. The proper numbering of the C-IL frames in the overall stack
of the mixed machine is required because the frame index is included into the local refer-
ence value.
3. Run-time errors during pure C-IL and MASM steps of the MX machine (pages 74, 85):
The run-time errors during pure C-IL and MASM steps of the MX machine are not de-
tected and not shown in the MX semantics though they are present in C-IL and MASM
semantics separately. Moreover, in the MX compiler correctness, Shadrin requires “the
execution of cMX does not get stuck” in the theorem, what is not defined by the MX se-
mantics.
4. Definition of an inter-language step (pages 73-74):
The predicate ext(cMX, pi) indicating the inter-language step is not formally defined. Then,
in the definition of the pure C-IL and MASM steps it is used with a stack as a parameter
instead of the MX configuration.
5. Writing the return value during the return from MASM to C-IL (page 75):
The function writeθ is used with a bit-string argument gpr[rv] though it is defined only
on typed C-IL values.
6. Compiler correctness for C-IL and MX (pages 67, 85)
No software conditions, under which the claims of compiler correctness for C-IL and MX
should hold, are stated (in contrast to MASM correctness given in a different form on page
46). The formulations of theorems are partially informal. A claim about the existence of
the next consistency point is missing.
7. C-IL and MX return address consistency (Definition 4.17, 5.17 on pages 63, 81)
Shadrin claims that a memory word at the return address taken from the stack is equal
to the starting address of the compiled code of the C-IL function call. This statement has
no sense. The correct one must be: the return address taken from the stack is equal to the
starting address of the epilogue. The fact about the epilogue was discovered by Alekhin
and later fixed by Baumann in his thesis [Bau14b].
8. C-IL stack consistency (Definition 4.22 on page 65)
No consistency for parameters and local variables of non-topmost frames is given.
273
Appendix C: Long List of “Small” Mistakes in PhD Dissertations / Books
9. MX compiler consistency (page 81)
Callee-save registers present in inactive contexts of the MX machine are not coupled with
the implementation.
Schmaltz’s PhD Thesis [Sch13]
1. Computation of misalignment interrupt (page 68)
Misalignments on fetch and load/store iev(c, I, pff, pfls)[3] correspond to the same inter-
rupt level and are computed together without taking into account the page fault on fetch
having a lower interrupt level. Therefore, misalignment on load/store can be generated
even in case of page fault on fetch, what, in turn, provides a wrong cause of interrupts.
2. Field reference function for C-IL (Definition 5.35 on page 123):
The function σpiθ (x, f) is defined only for values x of pointer types. Therefore, the C-IL
semantics does not allow to access a field in an array of structures though such arrays are
pretty common in the programming practice.
3. C-IL transition function for goto and if-not-goto (Definition 5.41 on page 127)
The C-IL transition function is defined for any statements ifnot e goto l and goto l though
the label l must be only in the body of the function. This condition is missing.
4. Return from C-IL function (Definition 5.41 on page 127)
The semantics of the C-IL return step does not have a condition requiring that the stack
contains more than one stack frame.
Baumann’s PhD Thesis [Bau14b]
1. Sequential simulation relation in concurrent settings (Definition 79 on page 155)
The simulation relation sim : Lc × La × P → B of the sequential simulation framework
does not take into account sets of addresses of possible inconsistent regions of memory
modified by concurrent steps of other units. Therefore, it is not possible to state the correct
sequential simulation theorem needed for the concurrent simulation.
2. Requirement on IO-points in consistency blocks (page 158, already reported in [Bau14a])
The definition oneIO(σ, τ) does not exclude the case with σ|io 6= ε∧τ |io = ε and, therefore,
is incorrect.
3. Generalized sequential simulation theorem (Theorem 5 on page 159)
The generalized sequential simulation theorem does not consider environment steps and,
hence, is not suitable for the concurrent simulation.
4. Concurrent simulation relation (page 162, Theorem 6 on page 166)
The simulation relation sim(d, par, e) = ∀p ∈ Uc(d, par). simp(d, par, e) for units at con-
sistency points is only based on machine configurations without the ownership state of
the abstract machine and, therefore, for any unit p it does not exclude memory addresses
locally owned by other units q 6= p. The reason is the absence of inconsistent memory
as a parameter in the sequential simulation relation. When not every unit reaches a con-
sistency point (how it is considered in the Cosmos model simulation theorem), it is not
274
possible to show this simulation relation holds. In the Cosmos model simulation theorem
it is also claimed to hold for any existing executions of the abstract machine, though it is
only possible for the safe ones.
5. Safe abstract schedules in Cosmos model simulation theorem (Theorem 6 on page 166)
Ownership safety transfer and shared invariant are claimed to hold for any safe abstract
schedules. The additional safety property Pa of the abstract machine is not considered in
these schedules though it is needed for the concurrent simulation.
6. Coupling of units’ indices in Cosmos model simulation theorem (Theorem 6 on page 166)
The theorem does not guarantee that a consistency block in the implementation and spec-
ification belongs to the unit with the same index.
7. Assumption 1 (page 164)
Safety of computations in the consistency block does not respect the property Pa needed
for the simulation. The concurrent simulation relation does not take the ownership as a
parameter. Moreover, well-formedness for both machines at the end of their consistency
block computations is not present among the premises in the assumption though it is
needed for proving the shared invariant.
8. Assumption 2 (page 165)
The ownership invariant (see Definition 12) for machine states is not present. The sequen-
tial simulation relation is used without the ownership state. In the assumption for the
abstract machine, Pa and oinv allowing to derive needed properties about owned and
shared components and the memory in configurations E and E′, are not present. The
same holds for the assumption for the concrete machine, where one additionally needs
the simulation relation and the shared invariant for E and D in order to transfer Pa from
the abstract machine to the concrete level.
9. Transfer of a few properties (Theorem 8, page 182)
The conjunction of predicates in safety(D,Q[P, par] ∧W, suit) is not a standard notation
and it is not defined in the thesis.
10. Transfer of well-behavior from block schedules to arbitrary interleaved schedules (pages 175, 182)
The computation of the well-behaviour flag d′.u(p).wb is shown incorrectly because it
cannot be based on the full memory. Instead, one can only use the memory covered by
the reads-set. No such restriction on the instantiation of the well-behaviour predicate is
stated.
11. Offset of local variables in C-IL compiler information (page 128)
The C-IL compiler information function infoIL.offlvar computing the offset of a given local
variable on the stack relies on the location in the body of a given function. However,
this offset is static after the compilation because the allocation of the local variables is
performed in the prologue part of the callee and cannot be changed during the function
execution. Therefore, the location as a parameter for infoIL.offlvar is redundant. This
parameter is not used in [Sha12] and was introduced in [Bau14b].
12. Computation of the distance between frame base addresses in C-IL stack (page 143)
In the computation dist(i) for a non-topmost frame i the size of the caller-save region is
sizeCrS(fi, loci) where loci points to a statement after the call. However, this size must be
calculated at the location loci − 1 of the function call.
275
Appendix C: Long List of “Small” Mistakes in PhD Dissertations / Books
13. C-IL local variable consistency (Definition 70 on page 134, already reported in [Bau14a])
In the computation of crsai,j the author uses infoIL instead of infoIL.crso. Moreover,
csrbasei must be crsbasei.
14. C-IL local variable consistency (Definition 70 on page 134, already reported in [Bau14a])
Local variables of a stack frame i are in callee-save registers saved in the same frame, i.e.
ME i(vi,j) = h.m4(csai,j) is claimed. Instead, one has to consider the values h.m4(csai+1,j)
in the next frame.
15. Return value destination in C-IL stack consistency (Definition 71 on page 135)
The fact that the return value can be stored into a variable used as a parameters in the
function was not taken into account.
16. Volatiles in expressions (Definition 64 on page 130)
In the ternary operator e ? e1 : e2 the author tests all the expressions e, e1, and e2 together
on the presence of the volatile accesses. However, for instance, if e and e1 do not contain
volatile accesses, while e2 does, and during a step the condition e is evaluated to one,
the volatile read in e2 would be performed neither in the C-IL nor in the compiled code,
though the whole expression with the ternary operator would contain a volatile access
according to [Bau14b]. Therefore, matching IO-points between the abstract and concrete
machines would become a problem.
17. Stack regions in the sequential simulation framework (page 190)
Though every unit of the concurrent C-IL machine should exclude all stacks (its own and
others belonging to all other units) from its memory consistency, only a single stack region
is treated by every units. The reason is that the stack base address and the stack maximal
size of a single stack are in the C-IL compiler information which is used by all units in
their simulation relations.
18. Ownership for C-IL and MASM stacks in shared invariants (Definitions 86, 87 on pages 187,
192)
An individual stack region (in the memory of the multi-core MIPS-86 machine) of a unit
is owned by this unit. However, since owned addresses may be shared, additionally, all
stack addresses should be excluded from the shared addresses.
System Architecture as an Ordinary Engineering
Discipline [PBLS15]
1. Returning to translated C in the inline assembly semantics (Section 13.3.3, page 327)
Though we may reach a consistency point, it is not always possible to return to the abstract
C0 configuration and one needs to continue the ISA execution until one reaches another
consistency point at which the reconstruction is possible. This issue and the solution were
discovered by Alekhin in the time when the original version of C0 semantics with inline
assembly [PBLS15] based on the reconstruction did not take this fact into account and,
therefore, had to be corrected later in [PBLS16].
276
Bibliography
[Adv11a] Advanced Micro Devices. AMD64 Architecture Programmer’s Manual Volume 2: Sys-
tem Programming, 3.19 edition, September 2011.
[Adv11b] Advanced Micro Devices. AMD64 Architecture Programmer’s Manual Volume 3:
General-Purpose and System Instructions, 3.16 edition, September 2011.
[AHPP10] E. Alkassar, M. Hillebrand, W. Paul, and E. Petrova. Automated verification of a
small hypervisor. In Third International Conference on Verified Software: Theories, Tools,
and Experiments (VSTTE’10), volume 6217 of LNCS, pages 40–54, Edinburgh, UK,
2010. Springer.
[App11] Andrew Appel. Verified software toolchain. In Gilles Barthe, editor, Programming
Languages and Systems, volume 6602 of Lecture Notes in Computer Science, pages 1–17.
Springer Berlin / Heidelberg, 2011.
[APST10] E. Alkassar, W. Paul, A. Starostin, and A. Tsyban. Pervasive verification of an OS mi-
crokernel: Inline assembly, memory consumption, concurrent devices. In Third In-
ternational Conference on Verified Software: Theories, Tools, and Experiments (VSTTE’10),
volume 6217 of LNCS, pages 71–85, Edinburgh, 2010. Springer.
[Bau14a] Christoph Baumann. Known errata in the dissertation. http://www-wjp.cs.
uni-saarland.de/publikationen/Ba14err.pdf, 2014.
[Bau14b] Christoph Baumann. Ownership-Based Order Reduction and Simulation in Shared-
Memory Concurrent Computer Systems. PhD thesis, Saarland University, Saarbru¨cken,
2014.
[BC05] Daniel Bovet and Marco Cesati. Understanding The Linux Kernel. Oreilly & Associates
Inc, 2005.
[BJK+06] Sven Beyer, Christian Jacobi, Daniel Kroening, Dirk Leinenbach, and Wolfgang
Paul. Putting it all together: Formal verification of the VAMP. International Jour-
nal on Software Tools for Technology Transfer, 8(4–5):411–430, August 2006.
[But97] David R. Butenhof. Programming with POSIX Threads. Addison-Wesley Longman
Publishing Co., Inc., Boston, MA, USA, 1997.
[CCK14] Geng Chen, Ernie Cohen, and Mikhail Kovalev. Store buffer reduction with MMUs.
In Dimitra Giannakopoulou and Daniel Kroening, editors, Verified Software: Theories,
Tools and Experiments, Lecture Notes in Computer Science, pages 117–132. Springer
International Publishing, 2014.
[CDH+09] Ernie Cohen, Markus Dahlweid, Mark Hillebrand, Dirk Leinenbach, Michał
Moskal, Thomas Santen, Wolfram Schulte, and Stephan Tobies. VCC: A practical
system for verifying concurrent C. In Stefan Berghofer, Tobias Nipkow, Christian
Urban, and Markus Wenzel, editors, Theorem Proving in Higher Order Logics (TPHOLs
2009), volume 5674 of Lecture Notes in Computer Science, pages 23–42, Munich, Ger-
many, 2009. Springer. Invited paper.
[Che16] Geng Chen. Store Buffer Reduction Theorem and Application. PhD thesis, Saarland
University, Saarbru¨cken, 2016.
277
Bibliography
[CL98] Ernie Cohen and Leslie Lamport. Reduction in TLA. In Davide Sangiorgi and
Robert de Simone, editors, CONCUR’98 Concurrency Theory, volume 1466 of Lecture
Notes in Computer Science, pages 317–331. Springer Berlin Heidelberg, 1998.
[CMST10] Ernie Cohen, Michał Moskal, Wolfram Schulte, and Stephan Tobies. Local verifica-
tion of global invariants in concurrent programs. In CAV, pages 480–494, 2010.
[CPS13] Ernie Cohen, Wolfgang Paul, and Sabine Schmaltz. Theory of multi core hypervisor
verification. In Peter van Emde Boas, Frans C. A. Groen, Giuseppe F. Italiano, Jerzy
Nawrocki, and Harald Sack, editors, SOFSEM 2013: Theory and Practice of Computer
Science, volume 7741 of Lecture Notes in Computer Science, pages 1–27. Springer Berlin
Heidelberg, 2013.
[CS10] Ernie Cohen and Bert Schirmer. From total store order to sequential consistency: A
practical reduction theorem. In Matt Kaufmann, Lawrence Paulson, and Michael
Norrish, editors, Interactive Theorem Proving (ITP 2010), Lecture Notes in Computer
Science, Edinburgh, UK, 2010. Springer.
[Do¨10] Jan Do¨rrenba¨cher. Formal Specification and Verification of a Microkernel. PhD thesis,
Saarland University, Saarbru¨cken, 2010.
[Deg11] Ulan Degenbaev. Formal Specification of the x86 Instruction Set Architecture. PhD
thesis, Saarland University, Saarbru¨cken, 2011.
[FS05] Xinyu Feng and Zhong Shao. Modular verification of concurrent assembly code
with dynamic thread creation and termination. SIGPLAN Not., 40(9):254–267,
September 2005.
[FSDG08] Xinyu Feng, Zhong Shao, Yuan Dong, and Yu Guo. Certifying low-level programs
with hardware interrupts and preemptive threads. SIGPLAN Not., 43(6):170–182,
June 2008.
[FSV+06] Xinyu Feng, Zhong Shao, Alexander Vaynberg, Sen Xiang, and Zhaozhong Ni.
Modular verification of assembly code with stack-based control abstractions. SIG-
PLAN Not., 41(6):401–414, June 2006.
[GFSS12] Yu Guo, Xinyu Feng, Zhong Shao, and Peizhi Shi. Modular verification of concur-
rent thread management. In Ranjit Jhala and Atsushi Igarashi, editors, Programming
Languages and Systems: 10th Asian Symposium, APLAS 2012, Kyoto, Japan, December
11-13, 2012. Proceedings, pages 315–331, Berlin, Heidelberg, 2012. Springer Berlin
Heidelberg.
[GHLP05] M. Gargano, M. Hillebrand, D. Leinenbach, and W. Paul. On the correctness of
operating system kernels. In J. Hurd and T. Melham, editors, Theorem Proving in
High Order Logics (TPHOLs) 2005, LNCS. Springer, 2005.
[Han96] David R. Hanson. C Interfaces and Implementations: Techniques for Creating Reusable
Software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1996.
[HL09] M. Hillebrand and D. Leinenbach. Formal verification of a reader-writer lock imple-
mentation in c. In 4th International Workshop on Systems Software Verification (SSV09),
volume 254 of Electronic Notes in Theoretical Computer Science, pages 123–141. Else-
vier Science B. V., 2009.
278
Bibliography
[IdR09] Thomas In der Rieden. Verified Linking for Modular Kernel Verification. PhD thesis,
Saarland University, Saarbru¨cken, 2009.
[IdRT08] T. In der Rieden and A. Tsyban. Cvm - a verified framework for microkernel pro-
grammers. In G. Klein R. Huuck and B. Schlich, editors, 3rd intl Workshop on Systems
Software Verification (SSV08), volume 217 of ENTCS, pages 151–168. Elsevier Science
B.V., 2008.
[Inc06] The Santa Cruz Operation Inc. System V Application Binary Interface – MIPS RISC
Processor Supplement 3rd Edition. Technical report, SCO Inc., February 2006.
[ISO11] ISO/IEC 9899:201x (Draft N1570): Programming languages — C. Standard,
ISO/IEC, April 2011.
[Kle09] Gerwin Klein. Operating system verification — an overview. Sa¯dhana¯, 34(1):27–69,
February 2009.
[KMP14] Mikhail Kovalev, Silvia M. Mu¨ller, and Wolfgang J. Paul. A Pipelined Multi-core MIPS
Machine - Hardware Implementation and Correctness Proof, volume 9000 of Lecture Notes
in Computer Science. Springer, 2014.
[Kov13] Mikhail Kovalev. TLB Virtualization in the Context of Hypervisor Verification. PhD
thesis, Saarland University, Saarbru¨cken, 2013.
[Lam79] L. Lamport. How to make a multiprocessor computer that correctly executes multi-
process programs. IEEE Trans. Comput., 28(9):690–691, September 1979.
[Lei08] Dirk Leinenbach. Compiler Verification in the Context of Pervasive System Verification.
PhD thesis, Saarland University, Saarbru¨cken, 2008.
[Lov10] Robert Love. Linux Kernel Development. Addison-Wesley Professional, 3rd edition,
2010.
[LPP05] D. Leinenbach, W. Paul, and E. Petrova. Towards the formal verification of a C0
compiler: Code generation and implementation correctness. In 3rd International
Conference on Software Engineering and Formal Methods (SEFM 2005), Koblenz, Ger-
many, 2005.
[LS89] Leslie Lamport and Fred B. Schneider. Pretending atomicity. Technical report, SRC
Research Report 44, 1989.
[LS09] Dirk Leinenbach and Thomas Santen. Verifying the Microsoft Hyper-V Hypervisor
with VCC. In Formal Methods (FM 2009), volume 5850 of Lecture Notes in Computer
Science, pages 806–809, Eindhoven, the Netherlands, 2009. Springer. Invited paper.
[Mic15] Microsoft Corp. MSDN Library: Development Tools and Languages. x64 Software
Conventions. https://msdn.microsoft.com/en-us/library/7kcdt6fy.
aspx, October 2015.
[Mic16] Microsoft Corp. VCC: A C Verifier. http://vcc.codeplex.com, 2016.
[Moo03] J.S. Moore. A grand challenge proposal for formal methods: A verified stack. In
Formal Methods at the Crossroads. From Panacea to Foundational Support., volume 2757
of LNCS, pages 161–172, Heidelberg, 2003. Springer.
279
Bibliography
[MPR07] Alexander Malkis, Andreas Podelski, and Andrey Rybalchenko. Precise thread-
modular verification. In Hanne Riis Nielson and Gilberto File´, editors, Static Analy-
sis: 14th International Symposium, SAS 2007, Kongens Lyngby, Denmark, August 22-24,
2007. Proceedings, pages 218–232, Berlin, Heidelberg, 2007. Springer Berlin Heidel-
berg.
[NYS07] Zhaozhong Ni, Dachuan Yu, and Zhong Shao. Using XCAP to certify realistic sys-
tems code: Machine context management. In Klaus Schneider and Jens Brandt, edi-
tors, Theorem Proving in Higher Order Logics: 20th International Conference, TPHOLs
2007, Kaiserslautern, Germany, September 10-13, 2007. Proceedings, pages 189–206,
Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
[Obe16] Jonas Oberhauser. A simpler reduction theorem for x86-TSO. In Arie Gurfinkel and
A. Sanjit Seshia, editors, Verified Software: Theories, Tools, and Experiments: 7th Inter-
national Conference, VSTTE 2015, San Francisco, CA, USA, July 18-19, 2015. Revised
Selected Papers, pages 142–164, Cham, 2016. Springer International Publishing.
[Owe10] Scott Owens. Reasoning about the implementation of concurrency abstractions on
x86-TSO. In Theo D’Hondt, editor, ECOOP 2010 – Object-Oriented Programming,
volume 6183 of Lecture Notes in Computer Science, pages 478–503. Springer Berlin /
Heidelberg, 2010.
[PBLS15] Wolfgang J. Paul, Christoph Baumann, Petro Lutsyk, and Sabine Schmaltz.
System Architecture as an Ordinary Engineering Discipline. http:
//www-wjp.cs.uni-saarland.de/lehre/vorlesung/info2/ss15/
protected_material/sysbook.pdf, 2015.
[PBLS16] Wolfgang J. Paul, Christoph Baumann, Petro Lutsyk, and Sabine Schmaltz. System
Architecture. An Ordinary Engineering Discipline. Springer, 2016.
[Pen16] Hristo Pentchev. Sound Semantics of a High-Level Language with Interprocessor Inter-
rupts. PhD thesis, Saarland University, Saarbru¨cken, 2016.
[POS95] IEEE POSIX 1003.1c-1995. Standard, IEEE Computer Society, 1995.
[Sch13] Sabine Schmaltz. Towards the Pervasive Formal Verification of Multi-Core Operating Sys-
tems and Hypervisors Implemented in C. PhD thesis, Saarland University, Saarbru¨cken,
2013.
[SF10] Zhong Shao and Bryan Ford. Advanced Development of Certified OS Kernels, 2010.
[Sha12] Andrey Shadrin. Mixed Low- and High Level Programming Languages Semantics. Auto-
mated Verification of a Small Hypervisor: Putting It All Together. PhD thesis, Saarland
University, Saarbru¨cken, 2012.
[SS12] Sabine Schmaltz and Andrey Shadrin. Integrated semantics of intermediate-
language C and macro-assembler for pervasive formal verification of operating sys-
tems and hypervisors from VerisoftXT. In Rajeev Joshi, Peter Mu¨ller, and Andreas
Podelski, editors, 4th International Conference on Verified Software: Theories, Tools, and
Experiments, VSTTE’12, volume 7152 of Lecture Notes in Computer Science, Philadel-
phia, USA, 2012. Springer Berlin / Heidelberg.
[The11] The Verisoft XT Consortium. The Verisoft XT Project. http://www.verisoftxt.
de, 2011.
280
Bibliography
[TMu03] White Paper: Processor Affinity. Multiple CPU Scheduling. TMurgent Technologies,
November 2003.
[Tsy09] Alexandra Tsyban. Formal Verification of a Framework for Microkernel Programmers.
PhD thesis, Saarland University, Saarbru¨cken, 2009.
[Ver10] Verisoft Consortium. The Verisoft Project. http://www.verisoft.de/, 2010.
281
