Sound semantics of a high-level language with interprocessor interrupts by Pentchev, Hristo
Sound Semantics
of a High-Level Language
with
Interprocessor Interrupts
E
Dissertation zur Erlangung des Grades
des Doktors der Ingenieurswissenschaften (Dr.-Ing.)
der Naturwissenschaftlich-Technischen Fakulta¨ten
der Universita¨t des Saarlandes
E
Hristo Pentchev
pentchev@wjpserver.cs.uni-saarland.de
Saarbru¨cken, Januar 2016

ETag des Kolloquiums: 13.06.2016
Dekan: Univ.-Prof. Dr. Frank-Olaf Schreyer
Vorsitzender des Pru¨fungsausschusses: Prof. Dr. Jo¨rg Siekmann
1.Berichterstatter: Prof. Dr. Wolfgang J. Paul
2.Berichterstatter: Prof. Dr. Bernhard Beckert
Akademischer Mitarbeiter: Dr. Johannes Hoffart

Abstract
Pervasive formal verification guarantees highest reliability of complex multi-core
computer systems. This is required especially for safety critical applications in au-
tomotive, medical and military technologies. A crucial part of formal verification
is the profound understanding of all system layers and the correct specification
of their computational models and the interaction between software and hard-
ware. The underlying architecture and the semantics of the higher-level programs
cannot be considered in isolation. In particular, when the program execution re-
lies on specific hardware features, these features have to be integrated into the
computational model of the programing language.
In this thesis, we present an integration approach for interprocessor inter-
rupts provided by multi-core architectures in the pervasive verification of system
software written in C. We define an extension to the semantics of a high-level lan-
guage, which considers interprocessor interrupts. We prove simulation between a
multi-core hardware model and the high-level semantics with interrupts. In this
simulation, we assume interrupts to occur on the boundary between statements.
We justify that assumption by stating and proving an order reduction theorem,
to reorder the interprocessor interrupt service routines to dedicated consistency
points.

Kurzzusammenfassung
Formale durchdringende Verifikation garantiert die ho¨chste Zuverla¨ssigkeit von
komplexen Multi-Core Computersystemen. Das ist insbesondere bei sicherheitskri-
tischen Anwendungen in der Automobil-, Medizin- und Milita¨rtechnik unerla¨sslich.
Ein wesentlicher Bestandteil der formalen Verifikation ist das tiefgru¨ndige Versta¨nd-
nis aller Ebenen des Modell-Stacks und die korrekte Spezifikation deren Rechen-
modelle und des Zusammenspiels zwischen Software and Hardware. Die zu-
grunde liegende Hardwarearchitektur und die Semantik der abstrakten Program-
miersprache ko¨nnen nicht isoliert von einander betrachtet und analysiert werden.
Insbesondere dann, wenn sich die Programmausfu¨hrung auf Eigenschaften der
Hardware stu¨tzt, mu¨ssen diese Eigenschaften in das Rechenmodell der Program-
miersprache integriert werden.
In dieser Arbeit pra¨sentieren wir einen Integrationsansatz fu¨r Inter-Prozessor-
Interrupts von Multi-Core-Architekturen in die durchdringende Verifikation der
Systemsoftware geschrieben in C. Wir definieren eine Erweiterung der Seman-
tik einer Hochsprache, die Inter-Prozessor-Interrupts beru¨cksichtigt. Wir be-
weisen Simulation zwischen einem Multi-Core-Hardware-Modell und der High-
Level-Semantik mit Interrupts. In dieser Simulation nehmen wir an, dass Inter-
rupts an der Grenze zwischen Anweisungen auftreten. Wir rechtfertigen diese
Annahme durch die Definition und den Beweis eines Reduktionstheorem, um die
Interprozessor-Interrupt-Service-Routinen zu dedizierten Konsistenz-Punkten zu
reordern.

Acknowledgements:
First I want to express my gratitude to Prof. Dr. Wolfgang J. Paul for supervising
my work, his wise advices, and all the valuable discussions about research and life.
I appreciate very much the cooperation within the Hyper-V verification group
of the Verisoft XT project and especially the prolific brainstorming sessions with
Mikhail Kovalev, Christoph Baumann and Sabine Schmlatz.
Last but not least, I thank my family for supporting me during the years.

Contents
Contents XI
1 Introduction 1
1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Abstract Hardware Model 7
2.1 MIPSP Instruction Set . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 VMRUN . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Processor Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Processor Core Configuration . . . . . . . . . . . . . . . . 13
2.4.2 Instruction Execution . . . . . . . . . . . . . . . . . . . . 14
2.4.2.1 Auxiliary Definitions for Instruction Decoding and
Execution . . . . . . . . . . . . . . . . . . . . . 14
2.4.2.2 Instruction Execution . . . . . . . . . . . . . . . 18
2.4.3 Interrupt Processing . . . . . . . . . . . . . . . . . . . . . 19
2.4.3.1 Auxiliary Definitions for Interrupt Processing . . 20
2.4.3.2 JISR . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.3.3 Return from Interrupt . . . . . . . . . . . . . . . 22
2.4.4 Core Transitions . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Advanced Programmable Interrupt Controller . . . . . . . . . . . 23
2.5.1 APIC configuration . . . . . . . . . . . . . . . . . . . . . 23
2.5.1.1 APIC Interrupt Command Register . . . . . . . . 24
2.5.1.2 Sending an IPI . . . . . . . . . . . . . . . . . . 25
2.5.1.3 Receiving an IPI . . . . . . . . . . . . . . . . . 26
2.5.2 APIC Transition . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Translation Lookaside Buffer . . . . . . . . . . . . . . . . . . . . 28
2.6.1 TLB configuration . . . . . . . . . . . . . . . . . . . . . . 29
2.6.2 TLB Transitions . . . . . . . . . . . . . . . . . . . . . . . 29
2.7 MIPSP Virtualization Restrictions . . . . . . . . . . . . . . . . . 30
XI
XII CONTENTS
2.8 Auxiliary Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.9 Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9.1 Execution Sequences . . . . . . . . . . . . . . . . . . . . 37
3 Reordering of Execution Sequences 39
3.1 MIPSP Extension . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.1 Ownership Policy . . . . . . . . . . . . . . . . . . . . . . 43
3.2.2 Safe Execution . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Auxiliary Definitions for Order Reduction . . . . . . . . . . . . . . 49
3.3.1 I/O Steps . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3.2 Interleaving points . . . . . . . . . . . . . . . . . . . . . . 50
3.3.3 Execution Sequences . . . . . . . . . . . . . . . . . . . . 54
3.4 Instantiation of COSMOS with MIPSP . . . . . . . . . . . . . . 55
3.4.1 COSMOS Instantiation Interface . . . . . . . . . . . . . . 56
3.4.2 Instantiation Restriction for reads . . . . . . . . . . . . . 60
3.4.3 Reordering Theorem . . . . . . . . . . . . . . . . . . . . . 60
4 Interrupt Thread 63
4.1 Ownership XT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.1.1 Ownership Policy XT . . . . . . . . . . . . . . . . . . . . 67
4.1.2 Safe Execution . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Reordering Proof . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.1 Aux definitions . . . . . . . . . . . . . . . . . . . . . . . 74
4.2.2 Simulation Relation . . . . . . . . . . . . . . . . . . . . . 87
4.2.3 Simulation Theorem . . . . . . . . . . . . . . . . . . . . . 90
5 C-IL Semantics 115
5.1 Sequential C-IL Semantics . . . . . . . . . . . . . . . . . . . . . 115
5.1.1 C-IL Types . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1.1.1 Primitive types . . . . . . . . . . . . . . . . . . 115
5.1.1.2 Complex types . . . . . . . . . . . . . . . . . . 116
5.1.1.3 Type qualifiers . . . . . . . . . . . . . . . . . . 116
5.1.2 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.1.3 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.1.4 Statements . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.1.5 Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.1.6 Configuration . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1.7 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1.8 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.1.9 Auxiliary Definitions . . . . . . . . . . . . . . . . . . . . . 126
5.1.10 Operational semantics . . . . . . . . . . . . . . . . . . . . 128
5.2 CC-IL Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2.0.1 Auxiliary Functions and Notation . . . . . . . . 131
CONTENTS XIII
5.3 Compiler Correctness . . . . . . . . . . . . . . . . . . . . . . . . 132
5.3.1 Consistency Points . . . . . . . . . . . . . . . . . . . . . 133
5.3.1.1 Software Consistency Points . . . . . . . . . . . 133
5.3.1.2 MIPSP Consistency Points . . . . . . . . . . . 134
5.3.2 Compiler Information . . . . . . . . . . . . . . . . . . . . 135
5.3.2.1 Memory Layout . . . . . . . . . . . . . . . . . . 142
5.3.3 Compiler Consistency . . . . . . . . . . . . . . . . . . . . 143
5.3.4 Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.3.4.1 I/O Steps . . . . . . . . . . . . . . . . . . . . . 147
5.3.5 Compiler Correctness Theorem . . . . . . . . . . . . . . . 149
6 CC-IL+IPI Semantics 153
6.1 CC-IL+IPI operational semantics . . . . . . . . . . . . . . . . . . 154
6.1.1 C-IL Steps . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.1.2 JIPISR Step . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.1.3 IPI Step . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.2 CC-IL+IPI Safety . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.3 IPI Service Routine . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4 CC-IL+IPI Simulation . . . . . . . . . . . . . . . . . . . . . . . . 168
6.4.1 CC-IL+IPI Simulation Relation . . . . . . . . . . . . . . . 168
6.4.2 Consistency Points . . . . . . . . . . . . . . . . . . . . . 173
6.4.3 CC-IL+IPI Simulation Theorem . . . . . . . . . . . . . . . 174
7 Conclusion 185
8 Appendix: Modular Specification and Verification of Interprocess
Communication 187
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
8.3 VCC Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
8.4 A Polymorphic Specification of IPC . . . . . . . . . . . . . . . . . 194
8.5 TLB Flush Example . . . . . . . . . . . . . . . . . . . . . . . . . 200
8.6 Interprocessor Interrupts . . . . . . . . . . . . . . . . . . . . . . . 201
8.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Index 205
Bibliography 209

Chapter 1
Introduction
Modern computer systems have changed our lives completely. They provide a
huge pile of services, which not only make everyday activities easier but also cre-
ate a lot of new possibilities. It is amazing to observe the enormous digitalization
in all areas of our society. Smartphones have become an irreplaceable companion
for almost two billion people [Sta]. Digital technologies are disrupting conven-
tional businesses in all industry sectors from publishing to banking. All this is
only possible due to the rapid evolution of hardware and the simultaneous de-
velopment of software. The developers of micro-processors put more and more
computing cores on a chip. This requires parallel software, to make use of the
available computational power. Parallelization however dramatically increases the
complexity of the system. A Fraunhofer developer survey in 2010 has shown that
less than twenty percent of the software developers claim to have good expertise
in multicore programing [Heb10]. These facts lead to the indisputable conclusion,
that modern computer systems are faulty.
Apparently a blue screen on a personal PC is not tragical, but software and
hardware failures have already caused hundreds of millions of losses (e.g. the
result of Intel’s well-known pentium bug [Pra95]). In worst cases software failures
are even responsible for casualties (e.g. the incidents with the radiation therapy
machine Therac-25 [LT93].)
And since modern complex computer systems are embedded in safety critical
applications in automotive, medical and military technologies it is obvious that
the presence of failures must be eliminated or at least minimized.
One way to increase the safety and reliability of systems is testing. It is widely
established in the industry and implements sophisticated methods. Unfortunately
the absence of failures in a tested system can only be claimed to the limited set of
simulated system executions. In a parallel system with arbitrary interleaving the
set of all possible executions is enormous. Thus even extensive tests of complex
systems can not guarantee correctness.
Another approach is to apply formal methods and verification tools to provide
mathematically based correctness proof. In this approach the systems are analyzed
1
2 CHAPTER 1. INTRODUCTION
and an abstract model is specified. Then all possible executions of the system are
proven to obey the formal specification. That way the correctness of the system is
verified against the specification. Two major factors for the consistency of formal
verification are the correctness of the specified abstract model and the soundness
of the verification tool. The latter means that if a property is evidenced by the
tool, than it indeed holds for all executions. In other words, formal methods rely
on a precise computational model. If the computational model of the verification
tool is wrong, verification results are faulty. If we consider solely software and
code-verification tools, then this computational model is defined by the semantics
of the programming language. With code-verification tools, one can prove the
correctness of programs, but this is not enough to state correctness of a system
containing soft- and hardware. We need to examine the behavior of the whole
system including soft- and hardware and to define pervasive theory. For instance
in low-level system software soft- and hardware are closely coupled, which makes
writing system software a challenging task. Soft- and hardware are operating
together and hardware characteristics influence the software execution massively.
Thus in order to program reliable system software, one has to consider soft-
and hardware in a mixed computational model. Moreover the verification of
properties of system software is impossible in the absence of a programing language
semantics, that incorporate hardware (e.g. interrupts and memory management
unit).
System verification was in the focus of the Verisoft XT project [Ver]. One
of the goals of the project was to prove the correctness the Microsoft’s Hyper-
VTM Hypervisor. Hyper-VTM is a hypervisor, for X86-64 virtualization. As most
system software in general also Hyper-VTM is written mostly in C. For that purpose
Microsoft developed during Verisoft XT with the help of the project partners the
concurrent C verification tool VCC. In the scope of the project big portions of
Hyper-VTM were verified in VCC. The soundness proof of VCC was sketched but
not completed. The development of a sound system verification method and
the definitions of the missing semantics continued after the project at the chair
of Professor Paul. A multi-core model stack was defined to cover the different
levels of abstraction of the system. It contains a hardware model of a multi-core
MIPS machine, an ownership based order reduction theorem [Bau14], extended
semantics for the C Intermediate Language (C-IL) with Ghost [Sch13], Mixed Low-
and High Level Programming Language Semantics [Sha12] and TLB virtualization
method [Kov13]. The results are described in several PhD theses and an attempt
to summarize everything is being described in the lecture notes of the current
Multicore System Architecture Lecture in [PBLS16].
The integration of inter process communication based on inter processor inter-
rupts (IPIs) in the model stack is handled in this work. We define a semantic model
of C-IL with interrupts and justify this model against executions of a multi-core
MIPS machine with local APICs. With this model we justify the VCC verifica-
tion of an expressive generic interprocess communication protocol . During the
Verisoft XT project the approach was applied, without the justification presented
1.1. RELATED WORK 3
in this thesis, in an academic hypervisor and in Microsoft Hyper-VTM (see the
appendix in Chapter 8).
1.1 Related Work
The only work we are aware of about C semantics with interrupts is presented
in [PBLS16]. Previous less general versions of this semantics were presented
in [Alk09], [Sta10], [PSS12], and [Sha12].
In [PBLS16] the authors define semantics for C + assembly + interrupts.
There interrupts are disabled during the execution of C portions and are only
visible during the execution of the assembly portions of the program. The assembly
code is pooling for the interrupt. This is sufficient for a special case of interrupt
handling. In the work presented in this thesis we define a more general approach
in which interrupts are integrated in the C semantics and visible on the border of
C statements.
For transferring the properties from C to ISA we use the model stack. A
similar model stack but so far without interrupts is used in the formally verified
work reported in [App12]. Furthermore the work presented there is based on a
sequential compiler correctness and it is not transferred to the concurrent case.
In order to prove simulation we need to reorder ISA executions in a suitable
schedule. The starting point for this order reduction theorem lies in [Bau14].
The generic order reduction theorem presented there however does not consider
interrupts.
1.2 Outline of the Thesis
In Chapter 2 we present MIPSP - a simplified multiprocessor MIPS instruction
set architecture (ISA) machine. We define the MIPSP configuration, present
its components and the overall MIPSP transition function in order to define
execution sequences of the machine. In Chapter 3 we present an ownership based
order reduction theorem to justify the reordering of arbitrary interleaved MIPSP
executions into schedules suitable for compiler correctness theorem application.
We define an ownership model, instantiate with MIPSP a generic model called
COSMOS and apply the COSMOS order reduction theorem. In Chapter 4 we
introduce interrupt threads and refine our ownership model. We state and prove
an order reduction theorem to reorder handler executions such that interrupts
occur at consistency points..
In Chapter 5 semantics for a programming language similar to C - the C Inter-
mediate Language (C-IL). We state a concurrent compiler correctness theorem in
the absence of interrupts similar to [Sha12] and [Kov13]. In Chapter 6 we extend
C-IL with a component mirroring the local APIC state. We state and prove a
concurrent simulation theorem between MIPSP executions and executions of the
extended C-IL. This work is concluded in Chapter 7.
4 CHAPTER 1. INTRODUCTION
1.3 Notation
Definition 1.1 I
Hilbert-Choice-
Operator
To choose an arbitrary element of a given set A we use the Hilbert-choice-operator
ε.
εA ∈ A
If the given set consists of a single element, then
ε{x} = x .
Definition 1.2 I
Natural Numbers
We denote by N the set of natural numbers (with zero).
N def= {0, 1, 2, . . .}
We denote by N+ the set of positive natural numbers.
N+ def= {1, 2, . . .}
Definition 1.3 I
Boolean Values
We denote by B set of Boolean values.
B def= {0, 1}
We also use the term bit to refer to a Boolean value.
Definition 1.4 I
Power Set
By 2A we denote the power set of a given set A, i.e. the set of all subsets of
A.
2A
def
= {B | B ⊆ A}
Definition 1.5 I
Interval of Natural
Numbers
We denote by [i : j] the interval of natural numbers from i to j.
[i : j] ∈ 2N def=
{
∅ if i > j
{i, i+ 1, . . . , j} if i ≤ j
Definition 1.6 I
Finite Sequences
We denote empty sequences by ε. We index the elements in a finite sequence
β of n elements from given set A from left to the right and start indices at 0.
β
def
= β0β1β2 . . . βn−1
We denote single elements of the array by βi or β[i], and the subsequence with
elements from βi to βj by β[i : j].
β[i : j]
def
=

ε if i > j
βi if i = j
βiβ[i+ 1 : j] otherwise
1.3. NOTATION 5
We denote by |β| the number of elements in β (i.e. the length of β).
|β| def= n
The set of all sequences of elements from A with length n we denote by An.
An
def
= {β | (|β| = n) ∧ ∀i ∈ [0 : n− 1]. βi ∈ A}
By A∗ we denote the set of arbitrary long sequences.
J Definition 1.7
Bit-Strings
A finite sequence of boolean values (bits) b of length n we call a bit vector.
b ∈ Bn
For a given bit-string b of length n we define its value as a natural number by
〈b〉 def=
n−1∑
i=0
bi · 2i .
The binary representation of a given natural number k as a n-bit long string,
where k ∈ [0 : 2n − 1], we define by
binn(k) ∈ Bn def= ε{b | (b ∈ Bn) ∧ (〈b〉 = k)} .
J Definition 1.8
Records
A record is a tuple of named components and their types. We define a record
R with components a and b of types A and B by
R
def
= [a ∈ A, b ∈ B]
and access the elements by R.a and R.b. We update the components of a record
r ∈ R by
r′ = r[a 7→ a′, b 7→ b′]

Chapter 2
Abstract Hardware Model
In this work we present a stripped-down concise model, that on the one hand
is functional enough to demonstrate our goals and on the other hand is simple
enough not to shift the focus of the thesis.
MIPSP is a simplified multiprocessor MIPS instruction set architecture (ISA)
machine. Similar models have been specified in the scope of the Verisoft XT
project and the subsequent research.
Ulan Degenbaev gives in [Deg11] a formal model of x64 ISA.
In [Sch13] Sabine Schmaltz defines a multiprocessor model MIPS called
MIPS -86. MIPS -86 is basically a multiprocessor version of the sequential pro-
cessor model from [KMP14] extended with MMU, TLB, APIC and device models,
which are motivated by and similar to x86 architectures.
We stay closest to the MIPS -86. The main simplifications compared to
MIPS -86 affect devices and MMUs. We also consider that all processors have
already been booted and are running. Readers interested in the omitted details
can look them up in [Sch13].
MIPS -86 processors run in two modes, user mode with address translation
and system mode without address translation. Memory management unit steps
can only be made in user mode. We are interested only in properties of system
code, that runs in untranslated mode. These properties concern only a small
fraction of the MMU state, which allows us to use a strongly simplified model.
Significant parts of the simplifications are based on the results of joint work
started in Verisoft XT. In [DPS09] Ulan Degenbaev, Wolfgang Paul and Norbert
Schirmer give a sketch of cache, SB, and TLB reduction theorems and basic
compiler consistency. In [Kov13] Mikhail Kovalev has shown that, following a
programming discipline, store buffers are transparent to the hypervisor and can
be taken away. He also provided proofs for memory virtualization covering the
behavior of the MMU.
In MIPSP we omited devices since they do not interfere with IPIs. Although
device signals are delivered to the core over the APIC, they do not influence the
7
8 CHAPTER 2. ABSTRACT HARDWARE MODEL
IPIs because of their lower priority. Our model can be extended with devices
without changing significantly any theorem or proof, that we present in this work.
We refine the specification of MIPS -86 concerning guest execution. MIPS -86
does not provide instructions for virtualization support and an intercept model,
which are necessary for virtualization. We add the VMRUN instruction for vir-
tualization support to the instruction set, but still do not provide detailed formal
definitions for guest execution.
In the next section we provide a summary of the instruction set of our model.
Then in Section 2.2 we define the configuration of a MIPSP machine. In the
subsequent sections we present the semantics of the different MIPSP components
and their local transition functions. Finally we define the overall transition function
of the machine as a composition of the steps of its sub-components.
2.1 MIPSP Instruction Set
The MIPSP instruction set consists of instructions which we separate in three
types.
• I-Type instructions operate on two registers and an immediate constant.
• R-Type instructions operate on three registers.
• J-Type instructions implement jumps in the program to a given address
passed as an immediate constant.
Every instruction type has a specific layout. We define the instruction layouts
in Table 2.1, Table 2.2 and Table 2.3. The opcode together with the function
code1 codes the operation. rs, rt and rd define register addresses. sa, imm and
iindex store the immediate constant operands.
Bits 31 . . . 26 25 . . . 21 20 . . . 16 15 . . . 0
Field Name opcode rs rt immediate constant imm
Table 2.1: I-Type Instruction Layout.
Bits 31 . . . 26 25 . . . 21 20 . . . 16 15 . . . 11 10 . . . 6 5 . . . 0
Field opcode rs rt rd sa fun
Name shift function
amount code
Table 2.2: R-Type Instruction Layout.
1The function code is relevant only for R-Type instructions.
2.1. MIPSP INSTRUCTION SET 9
Bits 31 . . . 26 25 . . . 0
Field Name opcode instruction index iindex
Table 2.3: J-Type Instruction Layout.
The set of MIPSP ISA instructions is defined in Table 2.4
2, Table 2.6 and
Table 2.5. The tables contain the coding of the instructions, their assembler
syntax and a coarse overview of their semantics. The full semantics of the given
instructions we define later in Section 2.4.2.
opcode Mnemonic Assembler-Syntax Effect
Data Transfer
100 000 lb lb rt rs imm rt = sxt(m1(rs + sxt(imm)))
100 001 lh lh rt rs imm rt = sxt(m2(rs + sxt(imm)))
100 011 lw lw rt rs imm rt = m4(rs + sxt(imm))
100 100 lbu lbu rt rs imm rt = zxt(m1(rs + sxt(imm)))
100 101 lhu lhu rt rs imm rt = zxt(m2(rs + sxt(imm)))
101 000 sb sb rt rs imm m1(rs + sxt(imm)) = rt[7:0]
101 001 sh sh rt rs imm m2(rs + sxt(imm)) = rt[15:0]
101 011 sw sw rt rs imm m4(rs + sxt(imm)) = rt
Arithmetic, Logical Operation, Test-and-Set
001 000 addi addi rt rs imm rt = rs + sxt(imm)
001 001 addiu addiu rt rs imm rt = rs + sxt(imm)
001 010 slti slti rt rs imm rt = (rs < sxt(imm) ? 1 : 0)
001 011 sltui sltui rt rs imm rt = (rs < zxt(imm) ? 1 : 0)
001 100 andi andi rt rs imm rt = rs ∧ zxt(imm)
001 101 ori ori rt rs imm rt = rs ∨ zxt(imm)
001 110 xori xori rt rs imm rt = rs ⊕ zxt(imm)
001 111 lui lui rt imm rt = imm016
Branch
000 001 bltz bltz rs imm pc = pc + (rs < 0 ? imm00 : 4)
000 001 bgez bgez rs imm pc = pc + (rs ≥ 0 ? imm00 : 4)
000 100 beq beq rs rt imm pc = pc + (rs = rt ? imm00 : 4)
000 101 bne bne rs rt imm pc = pc + (rs 6= rt ? imm00 : 4)
000 110 blez blez rs imm pc = pc + (rs ≤ 0 ? imm00 : 4)
000 111 bgtz bgtz rs imm pc = pc + (rs > 0 ? imm00 : 4)
Table 2.4: I-Type Instructions of MIPS .
2To distinguish between branch instructions with the same opcode we additionally use the
rt field.
10 CHAPTER 2. ABSTRACT HARDWARE MODEL
opcode Mnemonic Assembler-Syntax Effect
Jumps
000 010 j j iindex pc = bin32(pc+4)[31:28]iindex00
000 011 jal jal iindex R31 = pc + 4
pc = bin32(pc+4)[31:28]iindex00
Table 2.5: J-Type Instructions of MIPS .
2.1.1 VMRUN
The virtualization instruction VMRUN implements the context switch from hy-
pervisor to guest. In our model VMRUN is an abstraction of the context switch
in X86, which in reality consists of more than just a single VMRUN instruction.
In MIPSP VMRUN saves the state of the hypervisor in the memory, restores the
guest state and changes the mode of the processor from system to user mode.
The formal specification of VMRUN and other virtualization extensions and their
integration into the current MIPS model is future work.
2.2 Configuration
Definition 2.1 I
Abstract
Hardware
Configuration/Ab-
stracted machine
state
A MIPSP configuration c of our abstract machine has the type CM
CM
def
= [cpu ∈ Pid → CCPU ,m ∈ CMEM ]
and contains:
• c.cpu - a mapping from processor identifier to processor configuration, where
Pid = [0 : np − 1] and np ∈ N+ is a parameter defining the number of
processors in the system, and
• c.m - a global memory.
Definition 2.2 I
CPU
Configuration
A processor configuration
CCPU
def
= [core ∈ CCORE , tlb ∈ CTLB , apic ∈ CAPIC ]
consists of:
• a core, executing instructions,
• a translation lookaside buffer (TLB), caching address translations,
• an advanced programmable interrupt controller (APIC), sending and receiv-
ing interrupts.
2.2. CONFIGURATION 11
opcode fun Mnemonic Assembler-Syntax Effect
Shift Operation
000000 000 000 sll sll rd rt sa rd = sll(rt,sa)
000000 000 010 srl srl rd rt sa rd = srl(rt,sa)
000000 000 011 sra sra rd rt sa rd = sra(rt,sa)
000000 000 100 sllv sllv rd rt rs rd = sll(rt,rs)
000000 000 110 srlv srlv rd rt rs rd = srl(rt,rs)
000000 000 111 srav srav rd rt rs rd = sra(rt,rs)
Arithmetic, Logical Operation
000000 100 000 add add rd rs rt rd = rs + rt
000000 100 001 addu addu rd rs rt rd = rs + rt
000000 100 010 sub sub rd rs rt rd = rs − rt
000000 100 011 subu subu rd rs rt rd = rs − rt
000000 100 100 and and rd rs rt rd = rs ∧ rt
000000 100 101 or or rd rs rt rd = rs ∨ rt
000000 100 110 xor xor rd rs rt rd = rs ⊕ rt
000000 100 111 nor nor rd rs rt rd = rs ∨ rt
Test Set Operation
000000 101 010 slt slt rd rs rt rd = (rs < rt ? 1 : 0)
000000 101 011 sltu sltu rd rs rt rd = (rs < rt ? 1 : 0)
Jumps
000000 001 000 jr jr rs pc = rs
000000 001 001 jalr jalr rd rs rd = pc + 4 pc = rs
Synchronizing Memory Operations
000000 111 111 rmw rmw rd rs rt rd’ = m
m’ = (rd = m ? rt : m)
Virtualization Instructions
000000 111 110 vmrun vmrun
TLB Instructions
000000 111 101 flush flush flushes TLB
Coprocessor Instructions
opcode rs fun Mnemonic Assembler-Syntax Effect
010000 10000 011 000 eret eret Exception Return
010000 00100 movg2s movg2s rd rt spr[rd] := gpr[rt]
010000 00000 movs2g movs2g rd rt gpr[rt] := spr[rd]
Table 2.6: R-Type Instruction of MIPS .
12 CHAPTER 2. ABSTRACT HARDWARE MODEL
 APIC Bus
CPU i
System Memory
Core
APIC
TLB
CPU j
Core
APIC
TLB
...
Figure 2.1: Abstract machine architecture
2.3 Memory
Definition 2.3 I
Memory
Configuration
The global memory is a mapping of thirty two bit wide addresses to bytes.
CMEM
def
= B32 → B8
Definition 2.4 I
Reading Memory
The content of n ∈ N+ consecutive memory cells starting at address a ∈ B32
of a memory m ∈ CMEM is defined by the following notation.
mn(a ∈ B32) ∈ B8·n def=
{
mn−1(a+ 1) ◦m(a) n > 0
 n = 0
Definition 2.5 I
Writing Memory
Changes of the memory content in memory m are applied by the function
2.4. PROCESSOR CORE 13
write
write(m ∈ CMEM , a ∈ B32, v ∈ B8∗, c ∈ B32 ∪ {⊥}) ∈ CMEM def=
m[addr → bytei(v)] c = ⊥
∧ addr ∈ [a : a+ (|v|/8)− 1]
∧ i = 〈addr〉 − 〈a〉
m[addr → bytei(v)] c ∈ B32 ∧ v ∈ B32
∧ addr ∈ [a : a+ 3]
∧ i = 〈addr〉 − 〈a〉
∧ c = m4(a)
m otherwise
where a is the address of the first byte to be written, v is the new value and
c is compare-value in case of read-modify-write access.
J Definition 2.6
Memory
Transition
Function
The only changes of the memory state are originating from a memory write
and for that reason the memory transition function δmem is defined completely
by the memory write function.
δmem(m ∈ CMEM , a ∈ B32, v ∈ B8∗, c ∈ B32 ∪ {⊥}) ∈ CMEM def= write(m, a, v, c)
2.4 Processor Core
2.4.1 Processor Core Configuration
J Definition 2.7
Core
Configuration
The processor core configuration CCORE consists of:
• a program counter pc storing a 32 bit wide pointer to the next instruction,
• a general purpose register file gpr of 32 registers which are 32 bit wide,
• a special purpose register file spr of 32 registers which are 32 bit wide.
CCORE
def
= [pc ∈ B32, gpr ∈ B5 → B32, spr ∈ B5 → B32]
When the processor is making a step, we have basically two possibilities. It
either executes the next instruction, or it performs a jump to the interrupt service
routine (JISR). Before defining the transition function of the core we first consider
these two cases. Later in Section 2.4.4 we present the core transition function.
14 CHAPTER 2. ABSTRACT HARDWARE MODEL
Table 2.7: MIPSSpecial Purpose Registers.
Index Alias Usage
0 sr status register (contains masks to enable/disable maskable interrupts)
1 esr exception sr
2 eca exception cause register
3 epc exception pc (address to return to after interrupt handling)
4 edata exception data
7 mode mode register ∈ {0311, 032}
2.4.2 Instruction Execution
In the following we introduce some additional notations and auxiliary functions
which we need in the formal definition of the instruction execution.
2.4.2.1 Auxiliary Definitions for Instruction Decoding and Execution
Definition 2.8 I
Instruction
Decoding
We define the following instruction predicates that define the instruction type
depending on the opcode.
rtype(I ∈ B32) ∈ B def= I[31 : 26] ∈ {06, 0104, 01302}
jtype(I ∈ B32) ∈ B def= I[31 : 26] ∈ {0410, 0411}
itype(I ∈ B32) ∈ B def= ¬(rtype(I) ∨ jtype(I))
The instruction opcode (together with the function code for R-type instruc-
tions) identifies the corresponding instructions. From the ISA tables in Section
2.1 one can easily define decode predicates that check the opcode (and function
code) of the current instruction. The naming convention is to name the predicates
with the mnemonic of the corresponding instruction from the tables. Here we list
as an example the predicates for jump, store word and addition.
j(I)
def
= I[31 : 26] = 000010
sw(I)
def
= I[31 : 26] = 101011
add(I)
def
= I[31 : 26] = 001000 ∧ I[5 : 0] = 10311
Definition 2.9 I
Instruction Layout
Fields
Furthermore, following the instruction layout we define shorthands for accessing
instruction fields.
2.4. PROCESSOR CORE 15
• Register address of the target (rt), source (rs) and destination (rd) register.
rt(I ∈ B32) ∈ B5 def= I[20 : 16]
rs(I ∈ B32) ∈ B5 def= I[25 : 21]
rd(I ∈ B32) ∈ B5 def= I[15 : 11]
• Immediate constants for I-type instructions.
imm(I ∈ B32) ∈ B16 def= I[15 : 0]
• Immediate constants for J-type instructions.
iindex(I ∈ B32) ∈ B26 def= I[25 : 0]
Memory Operations
J Definition 2.10
Memory
Operations
The instructions which require memory access are the store instructions
store(I ∈ B32) ∈ B def= sw(I) ∨ sh(I) ∨ sb(I),
the load instructions
load(I ∈ B32) ∈ B def= lw(I) ∨ lh(I) ∨ lb(I) ∨ lhu(I) ∨ lbu(I)
and read-modify-write rmw(I).
They access one or several bytes in the memory starting with the byte at
the address defined by the function ea, which makes a case distinction on the
instruction type.
ea(c ∈ CCORE , I ∈ B32) ∈ B32 def=
{
c.gpr(rs(I)) if rtype(I)
c.gpr(rs(I)) +32 sxt32(imm(I)) if itype(I)
The number of accessed bytes might be 1, 2 or 4 and is defined by:
d(I ∈ B32) ∈ N def=

1 if lb(I) ∨ lbu(I) ∨ sb(I)
2 if lh(I) ∨ lhu(I) ∨ sh(I)
4 if lw(I) ∨ sw(I) ∨ rmw(I)
In case of a store instruction the corresponding bytes of the target register contain
the value to be written in the memory. Formally we define the store value by the
function sv.
sv(c ∈ CCORE , I ∈ B32) ∈ B8·d(I) def= core.gpr(rt(I))[8 · d(I)− 1 : 0]
The value read from the memory and passed to the transition function as a
parameter R ∈ B8 ∪ B16 ∪ B32 is extended to the width of a register (i.e. 32) by
the function lv.
lv(I ∈ B32, R ∈ B8·d(I)) ∈ B32 def=
{
zxt32(R) if lhu(I) ∨ lbu(I)
sxt32(R) otherwise
16 CHAPTER 2. ABSTRACT HARDWARE MODEL
Shift Operations
Definition 2.11 I
Shift Operations
The shift instructions
shift(I ∈ B32) ∈ B def= sll(I) ∨ srl(I) ∨ sra(I) ∨ sllv(I) ∨ srlv(I) ∨ srav(I)
perform shift operations on the content of the target register. The shift distance
is a number in the interval from 0 to 31 and is defined by:
sdist(c ∈ CCORE , I ∈ B32) ∈ [0 : 31] def=
{
〈c.gpr(rs(I))[4 : 0]〉mod 32 if I[3]
〈I[10 : 6]〉mod 32 otherwise
The result of the shift operation is defined by:
sres(c ∈ CCORE , I ∈ B32) ∈ B32 def=

x[32− d− 1 : 0]0d if I[1 : 0] = 00
0dx[31 : d] if I[1 : 0] = 10
x[31]dx[31 : d] if I[1 : 0] = 11
where
d = sdist(c, I)
x = c.gpr(rt(I))
Arithmetic and Logic Operations
Definition 2.12 I
Arithmetic and
Logic Operations
The arithmetic and logic instructions are denoted by the predicate alu.
alu(I ∈ B32) ∈ B def=
{
I[31 : 29] = 001 if itype(I)
I[5 : 4] = 10 if rtype(I)
The ALU operations are executed on 32 bit values and compute a 32 bit result.
The operands of an ALU computation are defined as follows.
• left operand
lop(c ∈ CCORE , I ∈ B32) ∈ B32 def= c.gpr(rs(I))
• right operand
rop(c ∈ CCORE , I ∈ B32) ∈ B32 def=

c.gpr(rt(I)) if rtype(I)
sxt32(imm(I)) if ¬(rtype(I) ∨ I[28])
zxt32(imm(I)) otherwise
2.4. PROCESSOR CORE 17
The result of the computation is defined by:
alures(c ∈ CCORE , I ∈ B32) ∈ B32 def=
lop(c, I) +32 rop(c, I) if addi(I) ∨ addi(I)
∨addu(I) ∨ addui(I)
lop(c, I)−32 rop(c, I) if sub(I) ∨ subu(I)
lop(c, I) ∧32 rop(c, I) if and(I) ∨ andi(I)
lop(c, I) ∨32 rop(c, I) if or(I) ∨ ori(I)
lop(c, I)⊕32 rop(c, I) if xor(I) ∨ xori(I)
lop(c, I) ∨32 rop(c, I) if addi(I)
rop[15 : 0]016 if lui(I)
031([lop(c, I)] < [rop(c, I)] ? 1 : 0) if slt(I) ∨ slti(I)
031(〈lop(c, I)〉 < 〈rop(c, I)〉 ? 1 : 0) if sltu(I) ∨ sltui(I)
General Purpose Register File Update
J Definition 2.13
General Purpose
Register File
Update
The execution of an instruction in the general case generates a result to be
saved in some GPR register. The predicate gprw defines the set of instructions
which cause an update of the GPR.
gprw(I ∈ B32) ∈ B def= alu(I) ∨ shift(I) ∨ load(I) ∨ rmw(I)
∨ jal(I) ∨ jalr(I) ∨movs2g(I)
The 5 bit wide address of the register to be written is defined by:
des(I ∈ B32) ∈ B5 def=

15 if jal(I)
rd(I) if rtype(I) ∧ ¬movs2g(I)
rt(I) otherwise
The 32 bit input for the register update is computed by the function gprdin
and depends on the current core configuration, the current instruction and the
input from the memory.
gprdin(c ∈ CCORE , I ∈ B32, R ∈ B8·d(I)) ∈ B32 def=

c.pc+32 4 if jal(I) ∨ jalr(I)
lv(I,R) if load(I) ∨ rmw(I)
c.gpr(rd(I)) if movs2g(I)
alures(c, I) if alu(I)
sres(c, I) if shift(I)
18 CHAPTER 2. ABSTRACT HARDWARE MODEL
Branch Instructions
Definition 2.14 I
Jump and Branch
Instructions
The branch instructions are denoted by the predicate branch.
branch(I ∈ B32) ∈ B def= beq(I)∨ bne(I)∨ bltz(I)∨ bgez(I)∨ blez(I)∨ bgtz(I)
The execution of a branch instruction depends on a branch condition. The eval-
uation of the corresponding condition for every branch instruction is defined by:
btaken(c ∈ CCORE , I ∈ B32) ∈ B def=

x = y if beq(I)
x 6= y if bne(I)
x < 032 if bltz(I)
x ≥ 032 if bgez(I)
x ≤ 032 if blez(I)
x > 032 if bgtz(I)
where
x = c.gpr(rs(I))
y = c.gpr(rt(I))
2.4.2.2 Instruction Execution
Definition 2.15 I
Instruction
Execution
We define the transition function for instruction execution δinstr based on the
current core configuration c, the current instruction I and the input from the
memory R.
δinstr(c ∈ CCORE , I ∈ B32, R ∈ (B8 ∪ B16 ∪ B32 ∪ {⊥})) ∈ CCORE def= c′
where the components of the new core configuration c′ are defined as follows
c′.pc =

(c.pc+32 4)[31 : 28] ◦ iindex(I) ◦ 00 if j(I) ∨ jal(I)
c.gpr(rs(I)) if jr(I) ∨ jalr(I)
c.pc+32 sxt(imm(I) ◦ 00) if branch(I) ∧ btaken(c, I)
c.pc+32 4 otherwise
c′.gpr(x) =
{
gprdin(c, I, R) if x = des(I) ∧ gprw(I)
c.gpr(x) otherwise
c′.spr(x) =
{
c.gpr(rt(I)) if x = rd(I) ∧movg2s(I)
c.spr(x) otherwise
The regular instruction execution definition excludes the processing of the
instruction eret, which is defined later in Section 2.4.3.3.
2.4. PROCESSOR CORE 19
2.4.3 Interrupt Processing
The instruction execution can be interrupted. This may be due to faulty code
or external events. In such a case the processor jumps to the interrupt service
routine to handle the interrupt. We classify interrupts on three criteria.
• Depending on their origin interrupts can be internal or external.
– The internal (software generated) interrupts are triggered due to events
in the core or memory, e.g. illegal instruction opcode.
– The external (hardware generated interrupts) are triggered by external
signals, e.g. via the APIC.
• Maskable interrupts can be (temporary) switched off so that the executed
program keeps the control. Non maskable interrupts are never ignored.
• The resume type of an interrupt defines how the program execution contin-
ues after return from interrupt service routine.
– abort - The program execution is aborted.
– repeat - The interrupted instruction is repeated.
– continue - The program execution continues with the next instruction,
i.e. the instruction after the interrupted one.
Furthermore, every interrupt has a priority. The interrupt priority defines the order
of interrupt handling in case of simultaneous interrupts. The highest priority is 0
and the lowest one is 7.
In Table 2.8 we present all supported interrupts. Our focus is on the external
I/O interrupts, part of which are the IPIs. Before defining formally the interrupt
processing in the core we introduce some auxiliary definitions.
interrupt shorthand internal/ type maskable
level external
0 reset external abort no reset signal
1 I/O external repeat yes devices
2 ill internal abort no illegal instruction
3 mal internal abort no misaligned
7 ovf internal continue yes overflow
Table 2.8: Interrupt Types
Assumption 1 Reset Assumption We assume in this thesis the absence of reset
interrupts.
20 CHAPTER 2. ABSTRACT HARDWARE MODEL
2.4.3.1 Auxiliary Definitions for Interrupt Processing
As already mentioned, the cause of an interrupt can be an internal or an exter-
nal event. The internal events we denote by the eight bit output value of the
(uninterpreted) function iev from the current core configuration and the current
instruction.
iev :: CCORE × B32 → B8
The experienced reader will notice that for page faults we need additional infor-
mation from the MMU. However we omit the page fault signals here and give the
arguments for this simplification at the end of the section.
The external events are computed by the APIC as defined in Definition 2.34
and passed to the core as eev ∈ B256.
Definition 2.16 I
Cause of an
Interrupt
We define the cause of an interrupt by a 32 bit vector ca, in which the
first 8 positions are defined by the internal and external interrupt events and the
remaining bits are set to zero. ca is computed from the current core configuration
c, the external event vector eev and the current instruction I.
ca(c ∈ CCORE , eev ∈ B256, I ∈ B32) ∈ B32
ca(c, eev, I)[j]
def
=

255∨
i=0
eev[i] if j = 1
iev(c, I )[j] if j ∈ [2 : 7]
0 otherwise
The bits [2 : 7] in the output of ca are used for internal interrupts. The second
bit denotes an external interrupt delivered by the APIC, as defined later in this
chapter. The first bit denotes a reset interrupt. Since we exclude reset in this
thesis we set ca(c, eev, I)[0] to 0.
Since interrupts may be masked, it is possible that an interrupt recorded in ca
does not influence the core transition.
Definition 2.17 I
Masked cause of
interrupt
The masked cause of interrupt defines the raised interrupts visible to the core.
Additionally to the cause of interrupt here we take into account the mask bits
from the status register. The second status register bit masks device interrupts
and the eighth one masks overflow interrupts.
mca(c ∈ CCORE , eev ∈ B256, I ∈ B32) ∈ B32
mca(c, eev, I)[j] =
{
ca(c, eev , I )[j] ∧ c.spr.sr[j] if j ∈ {1, 7}
ca(c, eev , I )[j] otherwise
The least significant bit that is set in mca defines the raised interrupt with the
highest priority.
Definition 2.18 I
Interrupt Level
We define the interrupt level, i.e. the priority of the triggered interrupt, by
the function il.
il(c ∈ CCORE , eev ∈ B256, I ∈ B32) ∈ N def= min{i | mca(c, eev, I)[i] = 1}
2.4. PROCESSOR CORE 21
J Definition 2.19
JISR event
Based on the previous definition we define the JISR predicate of the core as a
disjunction of all masked cause bits.
isJISR(c ∈ CCORE , eev ∈ B256, I ∈ B32) ∈ B def=
∨
j
mca(c, eev , I )[j]
In case of an active JISR signal we have an unmasked interrupt, which has to
be handled, and we jump to the interrupt service routine.
2.4.3.2 JISR
The JISR transition saves the current execution context of the core, stores in-
formation for the triggered interrupt and sets the program counter to the first
instruction of the service routine.
J Definition 2.20
Jump to Interrupt
Service Routine
We define the JISR transition function δjisr based on the current core config-
uration c, the current instruction I, the external event vector eev, and the input
from the memory R.
δjisr(c ∈ CCORE , eev ∈ B256, I ∈ B32, R ∈ (B8·d(I) ∪ {⊥})) ∈ CCORE def= c′
where the components of the new core configuration c′ are defined as follows.
c′.pc = 0
c′.spr(x) =

032 if x = sr
032 if x = mode
c.sr if x = esr
zxt32(mca(c, eev, I)) if x = eca
nextpc if x = epc
data if (x = edata) ∧ (il(c, eev, I) = 1)
c.mode if x = emode
c.spr(x) otherwise
c′.gpr =
{
δinstr(c, I, R).gpr if continue(c, I, eev)
c.gpr otherwise
We denote
• by nextpc ∈ B32 the program counter to be restored after JISR
nextpc =
{
δinstr(c, I, R).pc if continue(core, I, eev)
core.pc otherwise
,
22 CHAPTER 2. ABSTRACT HARDWARE MODEL
• by data ∈ B32 the interrupt data
data = bin32(k) ,
where k ∈ B8 is the highest priority of the external signals.
k = min{j | eev[j] = 1}
In our model we only examine a single type of external interrupts but in
general there are different interrupts delivered through the APIC.
2.4.3.3 Return from Interrupt
The execution of the instruction eret is to be considered as a separate case. It is
the last instruction to conclude the interrupt service routine and to switch back
to the interrupted program.
Definition 2.21 I
Return from
Interrupt
Transition
Function
We define the return from interrupt by δeret based on the current core con-
figuration c.
δeret(c ∈ CCORE ) ∈ CCORE def= c′
where the new core configuration c′ is defined as follows.
c′.pc = c.spr(epc)
c′.gpr = c.gpr
c′.spr(x) =

c.spr(esr) if x = sr
c.spr(emode) if x = mode
c.spr(x) otherwise
2.4.4 Core Transitions
Now we have the auxiliary definitions necessary to define the transition function
of the processor core.
Definition 2.22 I
Core Transition
Function
We define δcore based on the current core configuration c, the current instruc-
tion I, the external event vector eev and the input from the memory R. δcore
makes a case split on the JISR signal and on the eret predicate. In the absence of
interrupts and if the current instruction is not eret the step of the core is defined
by δinstr.
δcore(c ∈ CCORE , eev ∈ B256, I ∈ B32, R ∈ (B8·d(I) ∪ {⊥})) ∈ CCORE def=
δjisr(c, eev , I, R) if isJISR(c, eev , I )
δeret(c) if ¬isJISR(c, eev , I ) ∧ eret(I)
δinstr(c, I ,R) otherwise
2.5. ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER 23
2.5 Advanced Programmable Interrupt Controller
The APIC provides a mechanism for interrupt delivery with two main functions.
On the one hand the APIC collects external interrupts and forwards them to the
core. On the other hand over the APIC one core can send and receive interrupts
to/from other cores. Such interrupts are called inter-processor interrupts (IPI).
IPIs are used in multiprocessor systems to implement synchronization points of
system wide functions, e.g. during start up. The system software uses IPIs
to enforce the execution of given tasks on processors, e.g. flushing TLBs after
changes in the page tables. In other words the APIC allows system software
running on different processors to communicate using IPIs. The delivery of the
IPIs is based on the APIC state, which is defined by the APIC registers. The
APIC registers are memory-mapped, i.e. they are accessed using regular memory
accesses to a dedicated memory page. x86 processors support different types
of IPIs and allow different methods of addressing the interrupt targets. IPIs are
specified by a vector and/or a type. The combination of both defines different
mechanisms of delivery from the target APIC to the processor it belongs to. The
set of targets of a given IPI can consist of one, all, or a group of processors. It is
specified depending on the IPI’s destination mode, i.e. physical destination mode
or logical destination mode. In physical destination mode we can either address
a single processor by the corresponding APIC or use a shorthand which specifies
as targets:
• all APICs,
• or only the issuing APIC,
• or all APICs excluding the issuing one.
The logical destination mode provides on x86 architectures more possibilities to
address a subset of processors.
Here we present a simplified APIC model, which is limited to a subset of the
x86 IPIs. We only consider the delivery of level-triggered maskable interrupts in
physical destination mode. We leave out APIC components related only to the
boot phase of the system, to devices and to delivery failures. In the same time
our model is defined in a way that allows us to extend it easily and the missing
parts can be just plugged into the following definitions.
2.5.1 APIC configuration
J Definition 2.23
APIC
configuration
The APIC configuration consists of four registers.
CAPIC ≡ [apicId ∈ B32, icr ∈ B32, isr ∈ B256, irr ∈ B256]
• The APIC ID register (apicId) stores a system wide unique ID of the APIC.
24 CHAPTER 2. ABSTRACT HARDWARE MODEL
• The interrupt command register (icr) stores the interrupt request, which is
to be sent, and all the information needed for its delivery.
• The in-service register (isr) denotes which interrupt requests are currently
being handled.
• The interrupt request register (irr) stores the interrupt requests waiting to
be handled.
In Table 2.9 we list the APIC registers together with their size and the corre-
sponding byte offset in the APIC page.
〈 Offset 〉 Register Software Read/Write Description
0 apicId ∈ B32 Read / Write APIC ID Register
4 icr ∈ B32 Read / Write Interrupt Command Register
16 isr ∈ B256 Read Only In-Service Register
48 irr ∈ B256 Read Only Interrupt Request Register
Table 2.9: APIC registers
2.5.1.1 APIC Interrupt Command Register
The interrupt command register allows software to specify and send an IPI. Let
us take a closer look at the layout of ICR as defined in Table 2.10.
Bits Name Meaning
31 : 24 DEST destination field
23 : 20 reserved
19 : 18 DSH destination shorthand
17 : 13 reserved
12 DS delivery status
11 DM destination mode
10 : 8 MT message type
7 : 0 VEC vector
Table 2.10: ICR Bits
We introduce a shorthand notation for accessing fields of register icr.
• icr .VEC denotes the interrupt vector.
icr .VEC ∈ B8 = icr[7 : 0]
• icr .MT denotes the interrupt’s type.
icr .MT ∈ B3 = icr[10 : 8]
We only use interrupts of type Fixed, i.e. MT = 03.
2.5. ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER 25
• icr .DM stores the destination mode.
icr.DM ∈ B = icr[11]
Physical destination mode is denoted by DM = 0.
• icr .DS stores the delivery status, i.e. whether the APIC is currently deliv-
ering an IPI (DS = 1) or not (DS = 0).
icr.DS ∈ B = icr[12]
• icr .DSH stores the destination shorthand of the receivers of the IPI.
icr.DSH ∈ B2 = icr[19 : 18]
For the set of receivers there are four possibilities
icr .DSH ∈ {ALL-BUT -SELF ,ALL,SELF ,DESTINATION }
where DESTINATION denotes an empty shorthand, i.e. the IPI target is
not defined by a shorthand but by the DEST field.
Destination Shorthand Coding
00 DESTINATION
01 SELF
10 ALL
11 ALL-BUT -SELF
Table 2.11: Possible values for icr .DSH .
• icr .DEST stores the APIC ID of the target in case of DESTINATION
destination shorthand.
icr.DEST ∈ B8 = icr[31 : 24]
2.5.1.2 Sending an IPI
To send an IPI, a processor programs the ICR of its APIC with the information
about the interrupt and its delivery. The function writeICR stores a given data
(of a proper type) into the ICR of the local APIC. All ICR bits are read/write
except DS. The DS field is a read only for software and is written only by the
APIC hardware. Whenever the ICR is written the DS is automatically set to 1.
This indicates that the ICR stores currently a pending interrupt request. When the
interrupt request is delivered to the targets the DS bit is cleared by the hardware.
Thus the system software should poll for the DS bit before initiating a write to the
ICR. Otherwise it may overwrite a previous interrupt request that is still pending.
J Definition 2.24
Write to ICR
We define function writeICR based on the current APIC state apic and the
passed value to be written into ICR data. writeICR models the software driven
26 CHAPTER 2. ABSTRACT HARDWARE MODEL
Interrupt Interrupt in Description
Request irr[i] Service isr[i]
0 0 no new request, no request in service
0 1 no new request, request in service
1 0 pending request, no request in service
1 1 pending request, request in service
Table 2.12: States of an incoming interrupt.
update of the register together with the semantics of the APIC itself, i.e. the
change of the delivery status.
writeICR(apic ∈ CAPIC , data ∈ B32) ∈ CAPIC def= apic[icr 7→ data′]
where
data ′[j] =
{
1 if j = 12
data[j] otherwise
Software Condition 1 (Software IPI) For software generated IPIs we have to
write the proper vector V EC = 08 into the ICR. We have to satisfy our model
restrictions and use only physical destination mode DM = 0 and fixed interrupts
MT = {Fixed}. Furthermore for simplicity we only consider IPIs that interrupt all
processors but the sender, i.e. destination shorthand DSH = ALL-BUT -SELF .
2.5.1.3 Receiving an IPI
For every received interrupt the APIC stores two bits, one in IRR and one in ISR.
The index of the corresponding bits is defined by the interrupt vector. Vector
vec = 08 denotes interrupts sent by other processors’ APICs. All other vectors
are used for device interrupts. The process of handling an IPI can be in one of
the four states shown in Table 2.12
The incoming requests are stored into IRR. An IPI request remains pending
until the core is ready to service it. The APIC delivers given pending IPI to the
core via the external event signal if the core is not servicing an IPI of the same
type currently. A pending IPI delivered to the core causes JISR if:
• there is no reset interrupt,
• IPIs are not masked.
When the jump to the interrupt service routine occurs, the state of the interrupt
changes to ”in-service”, the IRR bit is cleared and the corresponding ISR bit is
set to one. After the interrupt is handled the in-service bit is cleared. Figure 2.2
shows the possible transitions.
2.5. ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER 27
IRR[0] = 0 
ISR[0] = 0
IRR[0] = 1 
ISR[0] = 0
IRR[0] = 1 
ISR[0] = 1
IRR[0] = 0 
ISR[0] = 1
receiveIPI
eret JISR
receiveIPI
eret
Figure 2.2: APIC receiver state.
It is possible that an interrupt request is delivered to the APIC and the cor-
responding IRR bit is already set, e.g. if several processors simultaneously send
an IPI to the same target. In such a case the IRR state does not change. The
APIC does not stack multiple requests of the same type and will deliver only one
of the simultaneously arriving interrupts to the processor. The system software
must provide data structures and implement a software protocol to ensure that
every single IPI is served. An interrupt request can get pending during a previous
request of the same type is being serviced.
2.5.2 APIC Transition
J Definition 2.25
APIC reads
For reading the APIC registers we define readAPIC based on the current APIC
state apic and an offset in the APIC page offset .
readAPIC (apic ∈ CAPIC , offset ∈ B7) ∈ (B32 ∪ B256 ∪ ⊥) def=
apic.apicId if 〈offset〉 = 0
apic.icr if 〈offset〉 = 4
apic.isr if 〈offset〉 = 16
apic.irr if 〈offset〉 = 48
⊥ if otherwise
J Definition 2.26
APIC writes
We define writeAPIC based on the current APIC state apic, an offset in the
APIC page offset and the passed value to be written into corresponding register
data. If the offset does not define a register writable for software the write is
28 CHAPTER 2. ABSTRACT HARDWARE MODEL
ignored.
writeAPIC (apic ∈ CAPIC , offset ∈ B7, v ∈ B32) ∈ CAPIC def=
writeAPICID(apic, v) if 〈offset〉 = 0
writeICR(apic, v) if 〈offset〉 = 4
apic otherwise
where writeAPICID is trivially defined as follows
writeAPICID(apic ∈ CAPIC , data ∈ B32) ∈ CAPIC def= apic[apicId 7→ data]
Definition 2.27 I
APIC transition
We define the transition function of the APIC δapic based on the input alphabet
ΣAPIC
def
= (B7 × B32) ∪ ({Fixed} × B8) ∪ {jisr, eret}.
The APIC transition has four possible cases.
• A write access to APIC registers initiated by software. The passed data is
written to the register with the corresponding offset.
• Receiving an IPI. The interrupt request register bit defined by the interrupt
vector is set to one.
• As a part of JISR the pending interrupt with the highest priority gets in-
service.
• At the end of the service routine, i.e. at return from interrupt, the ISR bit
corresponding to the serviced interrupt is cleared.
δapic(APIC ∈ CAPIC ,in ∈ ΣAPIC ) ∈ CAPIC def=
writeAPIC (apic, offset , v) if in = (offset , v)
apic[irr [〈vector〉] 7→ 1] if in = (Fixed, vector)
apic[irr [i] 7→ 0, isr [i] 7→ 1] if in = jisr
apic[isr [j] 7→ 0] if in = eret
where i = min{k | apic.irr[k] = 1} and j = min{k | apic.isr[k] = 1}.
2.6 Translation Lookaside Buffer
TLBs are an essential part of modern memory systems. They cache address
translations for memory virtualization. The content of the TLBs have to be
2.6. TRANSLATION LOOKASIDE BUFFER 29
synchronized on several occasions e.g. when a faulty translation is discovered and
must be removed. Synchronized TLB flushes are implemented with IPIs.
A detailed model of a TLB with its operations is presented in [Sch13]. However
in system mode the processor runs without address translation. Since we examine
only processor instruction execution in system mode, i.e. without address transla-
tion, we do not consider the concrete translations cached in the TLB. Furthermore
we are not interested in proofs of the MMU or TLB virtualization. Both topics
have been handled in [Kov13].
Concerning the TLB, we are interested only in properties about TLB flushes
that are synchronized over IPIs. Therefore in the scope of this work it is sufficient
to have a TLB model that can grow, i.e. by adding new translations to the set of
cached ones, and can be emptied(flushed). We think of TLB entries being marked
with unique (increasing) identifiers. Since we are not interested in the values of
the TLB entries we define our model based purely on the identifiers.
2.6.1 TLB configuration
J Definition 2.28
TLB
Configuration
The TLB configuration
CTLB
def
= [current ∈ N, f lushed ∈ N]
consists of:
• a current counter, which stores the identifier of the translation added last,
• a flushed counter, which stores the identifier of the the last translation
added before the last flush.
2.6.2 TLB Transitions
J Definition 2.29
TLB Transition
Function
We define the transition function of the TLB δtlb based on the input alphabet
ΣTLB
def
= add ∪ flush.
When we add a new translation in our model, we increase the current counter
by one. When we flush the TLB, we copy the value of the current counter into
the flushed counter.
δtlb(tlb ∈ CTLB , in ∈ ΣTLB ) ∈ CTLB def={
[tlb.current+ 1, tlb.f lushed] if in = add
[tlb.current, tlb.current] if in = flush
30 CHAPTER 2. ABSTRACT HARDWARE MODEL
2.7 MIPSP Virtualization Restrictions
Modern processors offer a lot of hardware support for virtualization. Moreover
only with the hardware virtualization support we can implement hypervisors for
full virtualization without modifying the guest.
Guest execution is initiated by the instruction VMRUN. VMRUN switches the
context and starts the instruction execution of guest instructions. During guest
execution guest steps are interleaved with MMU steps. This continues until an
intercept occurs. In this case the context is switched from guest to hypervisor
by the hardware step VMEXIT. VMEXIT is the counter part of the instruction
VMRUN. It saves the guest state in the memory, restores the hypervisor state and
changes the processor mode. After VMEXIT the processor execution continues
with the hypervisor instruction following VMRUN. The hypervisor handles the
reason for the intercept and starts the guest again.
However the main topic of this work is on the communication between pro-
cessors executing hypervisor kernel code. Our statements and proofs are not
influenced by guest steps and the implementation of context switches between
guest and hypervisor. Therefore we omit the definitions for saving/restoring the
context and the specification of the guest intercept mechanism. It is sufficient to
know that a pending IPI during guest execution will cause a context switch.
We use a single guest step in our MIPSP model as an abstraction of all steps
in guest mode. This includes guest instruction execution and MMU steps.
We reduce the effect of VMRUN/VMEXIT to mode change and the effect of
guest steps to adding new translations into the TLB.
In this way our model is an abstraction in which the core is always containing
the hypervisor context. In a simulation relation between MIPSP and an extended
virtualization MIPS machine during guest execution the MIPSP registers will be
mapped not to the real registers of the extended machine, but to the memory
area where they were saved during mode switch. Furthermore one would have to
prove:
• that the memory assigned to guests is disjoint from the memory of the
hypervisor and from the APIC page and
• that guests never access neither hypervisor memory, nor the APIC.
2.8 Auxiliary Definitions
In order to define formally the overall MIPSP transition we introduce in this
section some additional definitions.
Definition 2.30 I
Register Accesses
We have shortened the notation for registers or other (sub)components, which
are uniquely identifiable. For example for better readability we refer to the program
counter of processor i by
h.pci
def
= h.cpu[i].core.pc
2.8. AUXILIARY DEFINITIONS 31
and to the delivery status register by
h.DSi
def
= h.cpu[i].apic.icr.DS.
MIPSP supports two execution modes: a system mode and user mode. We
let the hypervisor run in system mode and its guests in user mode.
J Definition 2.31
Processor Mode
The predicate guest denotes whether a given processor operates in a user
mode.
guest(core ∈ CCORE ) ∈ B def= core.mode[0] = 1
J Definition 2.32
External Interrupt
Request
The predicate irr denotes existence of a pending interrupt request in the
current APIC configuration.
irr(apic ∈ CAPIC ) ∈ B def=
255∨
j=0
apic.irr[j]
J Definition 2.33
Handling an
External Interrupt
The predicate isr denotes whether currently an external interrupt is in-service.
isr(apic ∈ CAPIC ) ∈ B def=
255∨
j=0
apic.isr[j]
J Definition 2.34
External Event
Signals
We define the external event vector eev ∈ B256 provided by the APIC apic ∈
CAPIC to the processor core as follows.
eev(apic)[j]
def
=
{
0 if ∃k ≤ j. apic.isr[k] = 1
apic.irr[j] otherwise
J Definition 2.35
Masked External
Interrupt Request
The predicate exint denotes whether in the current configuration an external
interrupt is triggered.
exint(c ∈ CCORE , apic ∈ CAPIC ) ∈ B def=
255∨
i=0
eev(apic)[i] ∧ c.sr[1]
J Definition 2.36
External eret
The predicate eretIPI denotes whether the current instruction is an eret
instruction that ends the service of an IPI.
eretIPI (c ∈ CCORE , I ∈ B32) def= eret(I) ∧ c.eca[1]
J Definition 2.37
Internal Interrupt
The predicate iint denotes whether in the current configuration an internal
interrupt is triggered.
iint(c ∈ CCORE , I ∈ B32) ∈ B def= isJISR(c, 0256, I)
32 CHAPTER 2. ABSTRACT HARDWARE MODEL
Definition 2.38 I
Current
Instruction
The function I returns the current instruction. This is the instruction which
the core will execute next if there is no interrupt.
I(core ∈ CCORE ,m ∈ CMEM ) def= m4(core.pc)
Software Condition 2 (Aligned Program Counter) Since MIPSP instructions
are coded in 32 bits we define a software condition for the hypervisor execution,
that requires aligned values of the program counter.
∀core ∈ CCORE . ¬guest(core) =⇒ core.pc[1 : 0] = 00
Definition 2.39 I
APIC Addresses
The APIC registers are mapped to addresses from 120012 to 1200517. This
region in the memory we call APIC range and denote by
Aapic
def
= [120012 : 1200517].
For every address addr ∈ Aapic the function aoff computes the aligned offset in
the APIC page.
aoff (addr ∈ B32) ∈ B7 def= (addr −32 120012)[6 : 2]00
Definition 2.40 I
APIC Ports Read
The predicate loadAapic denotes whether the current instruction is a load from
the APIC range.
loadAapic (c ∈ CCORE , I ∈ B32)
def
= load(I) ∧ ea(c, I) ∈ Aapic
Definition 2.41 I
APIC Ports Write
The predicate storeAapic denotes whether the current instruction is a store to
the APIC range.
storeAapic (c ∈ CCORE , I ∈ B32)
def
= store(I) ∧ ea(c, I) ∈ Aapic
Definition 2.42 I
Memory System
The function ms defines the processors’ view of the memory system. It
comprises the memory and the APIC ports.
If the address a belongs to the APIC, we have first to compute the aligned
offset in the APIC range by aoff (a) and identify the APIC register, to which it
belongs. Then we read the entire register, wich is either 32 (i.e. apicId and icr)
or 256 (i.e. isr and irr) bit long, by the readAPIC function. At the end, if the
computed offset belongs to isr or irr, we select the proper word in it. Note,
that we allow only to read words from the APIC range. This condition should be
maintained by the MIPSP transition from the next section.
2.9. TRANSITION 33
msd(apic ∈ CAPIC ,m ∈ CMEM ) ∈ CMEM
msd(apic,m)(a)
def
=
readAPIC (apic, 16)[aoff (a)− 16 : aoff (a)− 13] if a ∈ Aapic
∧aoff (a) ∈ [16 : 47]
readAPIC (apic, 48)[aoff (a)− 48 : aoff (a)− 45] if a ∈ Aapic
∧aoff (a) ∈ [48 : 69]
readAPIC (apic, aoff (a)) if a ∈ Aapic
∧aoff (addr) /∈ [16 : 69]
md(a) otherwise
2.9 Transition
We model our concurrent MIPSP machine as an automaton with external inputs.
The set of inputs we denote by ΣM . The transitions function of the automaton
we denote by ∆.
∆ : CM × ΣM ⇀ CM
∆ is partial since for every input a set of conditions must be fulfilled in order the
transition to be taken. We denote the transition guard by the predicate G.
G : CM × ΣM → B
Obviously having in our system multiple components that might be making a
step simultaneously we need a way to model a non-deterministic semantics as a
deterministic one. We handle this by adding a scheduling information to the input
parameters of our automaton. In that way we define execution sequences in which
every step is completely defined by the sequence of inputs passed to the system.
Later when proving properties of our models we have to consider all possible
input sequences and executions.
As we said in the beginning of this chapter, the overall transition of our
model is a composition of the transitions of different components. The transition
function makes a case distinction on the stepping component and the kind of
particular step it is executing. In some cases only one component is stepping
and in some other several components are performing a step simultaneously, e.g.
memory and core are involved in the execution of a store instruction. In such
cases one component can be determined as the leading or active one, whereas the
others do not make a step on their own, but instead respond to the step of the
active component. In our model, we assume that the memory can not perform
34 CHAPTER 2. ABSTRACT HARDWARE MODEL
active steps. Under this assumption the possible active component is either a
core, or a TLB, or an APIC.
Definition 2.43 I
Set of Possible
Steps
The input alphabet of the system ΣM denotes the set of possible inputs.
ΣM
def
= core× Pid ∪ ipi× Pid ∪ guest× Pid ∪ vmexit× Pid
where
• (core, pid) is the input defining a core step of processor pid.
• (ipi, pid) is the input defining an IPI broadcast step where the APIC of
processor pid is the sender.
• (guest, pid) is the input defining a guest step of processor pid.
• (vmexit, pid) is the input defining a VMEXIT step of processor pid.
Definition 2.44 I
Transition
Function
The definition of the transition function is split into four cases, depending on
the input α. We define the new configuration
h′ = ∆(h, α)
for the components that are changed by the transition and omit listing components
that stay unchanged.
• A core step of processor pid.
The guard of the core step transition:
– forbids instruction execution in guest mode,
– forbids fetching of instruction from the APIC page,
– forbids memory accesses to bytes in the memory and in the APIC page
at the same time,
– forbids read-modify-write accesses to the APIC page,
– allows only word read/write accesses to the APIC page.
G(h, (core, pid)) def= (isJISR(c, eev , I ) ∨ ¬guest(core))
∧ h.pcpid /∈ Aapic
∧ ((store(I) ∨ load(I)) ∧ ea(h.corepid, I) /∈ Aapic
=⇒ (ea(h.corepid, I) + d(I)) /∈ Aapic
∧ (rmw(I) =⇒ ea(h.corepid, I) /∈ Aapic
∧ (ea(h.corepid, I) + 3) /∈ Aapic)
∧ ((store(I) ∨ load(I)) ∧ d(I) 6= 4 =⇒ ea(h.corepid, I) /∈ Aapic)
2.9. TRANSITION 35
The new configuration
h′ = ∆(h, (core, pid))
is defined as follows.
h′.corepid = δcore(h.corepid, eev, I, R)
h′.apicpid =

δapic(h.apicpid, jisr) if exint(h.corepid, h.apicpid)
δapic(h.apicpid, eret) if eretIPI (h.corepid, I)
δapic(h.apicpid, (aoff (addr), data)) if storeAapic (h.corepid, I)
∧¬isJISR(h.corepid, eev, I)
h.apicpid otherwise
h′.tlbpid =
{
δtlb(h.tlbpid,flush) if flush(I)
h.tlbpid otherwise
h′.M =

δmem(h.M, (addr, data)) if (rmw(I) ∨ store(I)) ∧ addr /∈ Aapic
∧¬isJISR(h.corepid, eev, I)
h.M otherwise
where
eev =
{
eev(h.apicpid) if exint(h.corepid, h.apicpid)
0256 otherwise
I = I(h.corepid, h.M)
addr = ea(h.corepid, I)
data = sv(h.corepid, I)
R = msd(I)(h.apicpid, h.M)(addr)
• Delivery of an IPI request from the local APIC on processor pid.
The guard of the IPI step transition:
– requires an IPI send request on processor pid,
36 CHAPTER 2. ABSTRACT HARDWARE MODEL
– IPI of type Fixed,
– physical destination mode.
G(h, (ipi, pid)) def= h.icrpid.DS = 1
∧ h.icrpid.MT = 03
∧ h.icrpid.DM = 0
The new configuration
h′ = ∆(h, (ipi, pid))
is defined as follows
h′.apici =

δapic(h.apici, (mt, vector))[DS 7→ 0] if i = pid ∧ target(i)
h.apici[DS 7→ 0] if i = pid ∧ ¬target(i)
δapic(h.apici, (mt, vector)) if i 6= pid ∧ target(i)
h.apici otherwise
where
mt = h.icrpid.MT
vector = h.icrpid.V EC
target(i) =h.icrpid.DSH = ALL
∨h.icrpid.DSH = ALL-BUT -SELF ∧ i 6= pid
∨h.icrpid.DSH = SELF ∧ i = pid
∨h.icrpid.DSH = ID ∧ h.APIC IDi = h.apicpid.ICR.DEST
• Guest step of processor pid.
A guest step adds a new translation to the TLB. The guard of the guest
step transition requires guest mode of the processor.
G(h, (guest, pid)) def= guest(h.corepid)
The new configuration
h′ = ∆(h, (guest, pid))
is defined as follows.
h′.tlbpid = δtlb(h.tlbpid, add)
2.9. TRANSITION 37
• VMEXIT of processor pid.
The guard of the VMEXIT step transition requires guest mode of the pro-
cessor.
G(h, (vmexit, pid)) def= guest(h.corepid)
The new configuration
h′ = ∆(h, (vmexit, pid))
is defined as follows.
h′.modepid[0] = 0
2.9.1 Execution Sequences
J Definition 2.45
Composition of
Steps
The composition of several transition steps defined by given input sequence β of
length n ∈ N, starting at initial configuration h we define recursively as follows:
∆n(h ∈ CM , β ∈ (ΣM )n) ∈ CM def=
{
h if n = 0
∆n−1(∆(h, β0), β[1 : n− 1]) otherwise.
J Definition 2.46
Execution
sequence
Given initial configuration h and an input sequence β we define the execution
sequence
h0β0h
1β1 . . . hn−1βn−1hn
where h0 = h and every other configuration hi is obtained executing the corre-
sponding input β[0 : i− 1]
hi = ∆i(h, β[0 : i− 1])
We denote execution sequences by
h
β→ h′,
where |β| = n and h′ = hn.
In execution sequences we implicitly assume that all step guards hold. This
implies for example that if βk is a core step input, then in the configuration h
k
the corresponding processor is in system mode.

Chapter 3
Reordering of Execution
Sequences
In the following sections we want to examine MIPSP execution sequences which
execute compiled code. We want to propagate properties of the C-IL program to
the MIPSP execution. For that purpose we need a simulation relation. The sim-
ulation relation between the program code and the MIPSP steps we call compiler
consistency. Compiler consistency as defined by Andrey Shadrin [Sha12] is sequen-
tial and considers the execution of the C-IL program on one core. Obviously one
C-IL step can correspond to several steps of MIPSP . Furthermore, we know that
MIPSP is a concurrent multiprocessor model, thus in MIPSP execution sequences
steps of different processors can interleave arbitrarily. This makes it impossible
to define a point in which consistency holds for several threads. We are looking
for the possibility to reorder MIPSP execution sequences in a way that allows
to apply a sequential compiler consistency theorem to interleaved multiprocessor
executions.
Proving an order reduction theorem for the general case does not fall in the
scope of this work. We use a generic order reduction theorem proven by Christoph
Baumann [Bau14].
Christoph Baumann defines an abstract ownership based model called COS-
MOS. COSMOS contains several computational units and a global memory. The
memory accesses in COSMOS must follow an ownership-based policy, which guar-
antees the absence of memory races among computational units on local data.
The memory is separated in portions, each of which is either exclusively owned by
a computational unit or shared.
The COSMOS order reduction theorem allows the reordering of executions in
a sequence of blocks of steps of a single processor. The idea is to apply sequential
compiler consistency on the blocks and to interleave only at consistent states. We
call these dedicated states interleaving points.
The main argument for the reordering is that most of the steps of a given
CPU have no impact on the execution of other CPUs. These steps do not access
39
40 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
L IO L L L L LIO
/i
i
Figure 3.1: Arbitrary interleaving.
IP IP IPIP IP IPIP IP
L IOL L L L L LIO L
/i
i
Figure 3.2: Reordered interleaving.
the shared portion of the global memory and we call them local steps. The effect
of local steps is visible only to a single processor. In contrast to local steps the
effect of so called I/O steps is visible not only to the processing unit but to the
whole environment.
I/O steps and interleaving points are defined by the verification engineer and
their choice depends on the ISA model, on the particular program and on the
compiler. On the one hand we define I/O steps and interleaving points in a way
that satisfies the conditions for the application of the COSMOS order reduction
theorem. On the other hand later we have to prove that every interleaving point
defined by us is a state that satisfies compiler consistency, and that the I/O
steps definition covers all non local steps in the execution sequence. We aim at
a complete model but our main focus is on execution sequences containing IPI
service routine. This includes the execution of compiled C-IL hypervisor kernel
code and small assembler portions, which are parts of the ISR. For other scenarios
the sets of interleaving points and I/O steps defined in this thesis eventually have
to be extended.
In order to apply the reordering theorem we have to instantiate COSMOS
with MIPSP . The instantiation interface contains definitions for I/O steps and
interleaving points. We identify I/O steps based on the resources they access.
The choice of the interleaving points is based on a precondition of the COSMOS
reordering theorem, i.e. that between two I/O steps there exist at least one
interleaving point. We call this requirement - IOIP condition.
Obviously in MIPSP APICs have to be considered as shared components, due
3.1. MIPSP EXTENSION 41
to the IPI step, which can modify many APICs at the same time. Thus, steps
accessing APIC should be considered I/O steps. Unfortunately the APIC is read
in every core step, which makes every core step an I/O step. This is a very strong
limitation for reordering. In the next section 3.1 we present an extension of the
MIPSP model to solve this problem.
In Section 3.2 we define an ownership discipline that implements the COSMOS
model ownership. In Section 3.3 we list some auxiliary definitions. At the end in
Section 3.4 we present an instantiation of COSMOS with the new MIPSP model
from Section 3.1.
3.1 MIPSP Extension
In every core step the APIC is accessed for computing the external interrupt signal.
Accessing the APIC in the absence of pending interrupts has no effect to the core
step and is in a sense superfluous. We modify MIPSP in such a way, that core
steps access APIC components only if there is a pending interrupt.
J Definition 3.1
MIPSP Alphabet
Extension
We add to the core step label in our input alphabet a Boolean flag that
identifies APIC accesses.
ΣM
def
= core× Pid × B ∪ ipi× Pid ∪ guest× Pid
From now on we denote the input that specifies a core step of processor pid
by (core, pid, ex)1 The flag ex identifies the steps where we check for external
interrupts. We now define the external event input to the core eev ∈ B256 by a
case distinction. If ex is 1 we read the APIC, otherwise we pass through an empty
event vector.
eev =
{
eev(h.apicpid) if ex
0256 otherwise
We want to show that the extension of the input alphabet of MIPSP does not
change its computation. The input parameter ex has to be bound to the APIC
state. On the one hand we want to read the APIC only if it makes sense to, i.e.
if there is a pending IPI. On the other hand we can not just ignore pending IPIs
by passing an empty event vector.
J Definition 3.2
MIPSP Alphabet
Invariant
For every input sequence we make the value of the external interrupt flag
ex of a core step (core, pid, ex) equivalent to the existence of external interrupt
request.
∀β ∈ ΣM , i ∈ [0 : |β| − 1], pid ∈ Pid .
βi = (core, pid, ex) =⇒ (ex⇐⇒ exint(h.corepid, h.apicpid))
1Instead of using as input (core, pid).
42 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
With this invariant we can easily prove a simulation between a machine with
and a machine without external interrupt flag in its input alphabet. From now on
we consider only executions which satisfy that invariant.
We introduce in the following two auxiliary definitions notation, to keep later
formalisms concise and readable.
Definition 3.3 I
Input Shorthands
For an input α ∈ ΣM
• α.pid denotes the index of the scheduled processor (all IPI steps are denoted
by an own ID equal to the number of processors in the system)
α.pid
def
=

i if α = (core, i, ex)
np if α = (ipi, i)
i if α = (guest, i)
i if α = (vmexit, i)
• and α.ex denotes whether α defines core step with an active APIC signal.
α.ex
def
=
{
ex if ∃i ∈ Pid . α = (core, i, ex)
⊥ otherwise
Definition 3.4 I
Step Predicates
We define the predicates core, guest and ipi to denote whether a given input
α defines a core step, a guest step or an IPI step respectively.
core(α ∈ ΣM ) ∈ B def= ∃i ∈ Pid , ex ∈ B. α = (core, i, ex)
guest(α ∈ ΣM ) ∈ B def= ∃i ∈ Pid . α = (guest, i)
ipi(α ∈ ΣM ) ∈ B def= ∃i ∈ Pid . α = (ipi, i)
3.2 Ownership
Ownership talks about memory addresses. The set of all addresses included in an
ownership setting we denote by A ⊆ B32. In our ownership model we separate
A into subsets and define access policy for each of them. We distinguish the
following three kinds of memory.
• The read only memory contains the addresses that can be read by all pro-
cessors and written by none.
Aro ⊂ B32
3.2. OWNERSHIP 43
• The private memory contains the addresses that are assigned to some pro-
cessor and can be accessed only by this processor.
Apr ⊂ B32
We say that private addresses are owned by the processor they are assigned
to.
• The shared memory contains the addresses that can be accessed by all
processors.
Ash ⊂ B32
During execution the private and the shared addresses can change their own-
ership state. Private addresses may get shared and vice versa. Contrary to that
dynamic behavior the set of all addresses A and of read only addresses Aro are
fixed. Thus the ownership state consists of a static and dynamic part. We simplify
our notation and pass through the execution only dynamic ownership information,
whereas the static information is implicitly known.
J Definition 3.5
Dynamic
Ownership
Information
The dynamic state of the ownership model is defined by a mapping of processor
indexes to a set of owned addresses O and by a set of shared addresses.
O def= [O ∈ Pid → 2B32 ,Ash ∈ 2B32 ]
In ownership o ∈ O the set of the addresses assigned to processor i is defined
by o.O(i). We call this set the processor’s owns-set and denote it by the shorthand
o.Oi. The set of private addresses in o is implicitly defined as the union of all
owned addresses.
o.Apr def=
⋃
i∈Pid
o.Oi
The set of all private addresses not owned by processor i we denote by:
o.O i
def
= o.Apr \ o.Oi
As shown in Figure 3.3 the shared memory and the private memory may
overlap. The intersection of the owned and shared addresses defines addresses
that have a single writer but multiple readers. According to that the ownership
state of every memory address a ∈ A can be either read only, or exclusively owned,
or shared owned, or shared unowned.
3.2.1 Ownership Policy
The properties that we maintain on our ownership setting are formally expressed
by the following invariant.
J Definition 3.6
Ownership
Invariant
The predicate ownership-inv denotes whether given ownership state o is valid.
44 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
System Memory
...
Read Only Memory
har d
Oi Oj
CPU i CPU jCPU i+1
Oi+1 ...
Figure 3.3: MIPSP memory partitioning into the ownership sets.
ownership-inv(o ∈ O) ∈ B def=
A = o.Apr ∪ o.Ash ∪ Aro
∧ o.Apr ∩ Aro = ∅
∧ o.Ash ∩ Aro = ∅
∧ ∀a, i, j. a ∈ o.Oi ∧ a ∈ o.Oj =⇒ i = j
It states, that:
• the complete address space A is covered by the ownership,
• the read only memory is disjoint with the private and shared memory,
• the owns-sets of different processors are disjoint.
Every ownership that satisfies ownership-inv we call valid.
As we mentioned, the addresses of the shared and the private memory can
change their ownership state. They can have different owners and be shared or
not during the execution. Thus for an execution sequence with n steps we define
a sequence of n + 1 ownership states o0 to on. The changes in the ownership
are represented by the difference between every two consecutive ownership oi
and oi+1. We call these changes ownership transfer. We define the validity of
ownership transfer by the following five rules as depicted in Figure 3.4.
1. Shared unowned addresses can be exclusively acquired by some processor
and get exclusively owned.
3.2. OWNERSHIP 45
System MemoryRead Only Memory
Shared Memory
Oi
2
5
3
1
4
Figure 3.4: Ownership transfer
2. Shared unowned addresses can be acquired by some processor and get shared
owned.
3. Shared owned addresses can be exclusively acquired by their owner and get
exclusively owned.
4. Shared owned addresses can be released by their owner and become shared
unowned.
5. Exclusively owned addresses can be released by their owner and become
shared unowned.
J Definition 3.7
Safe Transfer
Ownership transfer can happen only on I/O steps between the owns-set of the
corresponding processor and the set of shared addresses. The predicate transfer
denotes in a compendious form the transfer rules for an address a, where O and
O′ denote the old and the new owns-sets, and sh and sh′ denote the old and the
new sets of shared addresses.
transfer(O ∈ 2B32 , O′ ∈ 2B32 , sh ∈ 2B32 ,sh′ ∈ 2B32 , a ∈ B32) ∈ B def=
(a ∈ O ∪ sh⇐⇒ a ∈ O′ ∪ sh′)
∧ (a /∈ sh ∧ a ∈ sh′ =⇒ a ∈ O ∧ a /∈ O′)
The predicate safetransfer denotes whether the ownership pair o and o′ sat-
isfies the validity for ownership transfer on steps of processor i, where io is a flag
46 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
indicating I/O steps.
safetransfer(i ∈ Pid , io ∈ B, o ∈ O, o′ ∈ O) ∈ B def=
∀a ∈ B32. transfer(o.Oi, o′.Oi, o.Ash \ o.O i, o′.Ash \ o′.O i, a) if io
∧∀j ∈ Pid . j 6= i =⇒ (a ∈ o.Oj ⇐⇒ a ∈ o′.Oj)
∧(o.Oj ∩ o.Ash = o′.Oj ∩ o′.Ash)
o = o′ otherwise
Memory Accesses are considered safe in respect to ownership if they follow
the following rules.
• Local steps of a given processor can read the read-only memory and ad-
dresses owned by the processor.
• Local steps of a given processor can write only addresses exclusively owned
by the processor.
• I/O steps of a given processor can read the read only memory, the shared
memory and addresses owned by the processor.
• I/O steps of a given processor can write addresses owned by the processor
and all shared unowned addresses.
Definition 3.8 I
Memory Access
Policy
The predicate safeacc defines our policy for reading and writing memory
addresses by a processor i based on the ownership setting o. R and W represent
the sets of addresses to be read and/or written. io is a flag denoting whether we
are executing an I/O step or not.
safeacc(i ∈ Pid , io ∈ B, R ∈ 2B32 ,W ∈ 2B32 , o ∈ O) def={
(R ⊆ o.Oi ∪ o.Ash ∪ Aro) ∧ (W ⊆ o.Oi ∪ (o.Ash \ o.O i)) if io
(R ⊆ o.Oi ∪ Aro) ∧ (W ⊆ o.Oi \ o.Ash) otherwise
3.2.2 Safe Execution
In order to give a formal definition for safe MIPSP execution we have to define
the memory footprint of MIPSP steps. Furthermore we have to define the set of
addresses accessible to the execution and the read only part of it.
Definition 3.9 I
MIPSP Static
Ownership
Information
The complete set of addresses of the MIPSP execution is trivially defined as
the full domain of the MIPSP memory.
AMIPSP
def
= B32
3.2. OWNERSHIP 47
The read only part contains the the memory region where the compiled code
resides Acode ⊂ B32 and the read only data of the compiled program, which we
here denote by AroC-IL ⊂ B32. Both are defined later in Chapter 5.
AroMIPSP
def
= Acode ∪ AroC-IL
The set of accessed addresses depends on the fact, whether the current in-
struction is interrupted. For convenience we introduce a new predicate to denote
external interrupts using the external interrupt flag from our extended alphabet.
J Definition 3.10
JISR Transition
Predicate
The predicate jisr IPI denotes whether the next transition is a core step with
an active JISR signal due to a pending IPI request.
jisr IPI(α ∈ ΣM ) ∈ B def= core(α) ∧ α.ex
At some places it is important to denote the processor id of the processor
being interrupted. Therefore we define a second version of the predicate, which
is additionally parametrized with the id of a particular processor.
jisr IPI(α ∈ ΣM , i ∈ Pid) ∈ B def= core(α) ∧ α.ex ∧ α.pid = i
J Definition 3.11
Read Addresses
The set of all memory addresses that are read during the execution of a given
step α is defined by the function reads.
reads(core ∈ CCORE ,m ∈ B32 → B8, α ∈ ΣM ) ∈ 2B32 def=
f(core) ∪ r(core,m) if core(α) ∧ ¬jisr IPI(α)
∧¬(iint(core, I) ∨ loadAapic (core, I))
f(core) if core(α) ∧ ¬jisr IPI(α)
∧(iint(core, I) ∨ loadAapic (core, I))
∅ otherwise
where f(core) denotes the bytes of the instruction being fetched
f(core) = {core.pc, . . . , core.pc+32 3} ,
and r(core,m) denotes the addresses read during intruction execution
r(core,m) =
{
{ea(core, I), . . . , ea(core, I) +32 (d(I)− 1)} if (load(I) ∨ rmw(I))
∅ otherwise
and I is the current instruction
I = I(h.coreα.pid,m) .
48 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
Definition 3.12 I
Written Addresses
The set of all memory addresses that are written during the execution of a
given step α is defined by the function writes.
writes(core ∈ CCORE ,m ∈ B32 → B8, α ∈ ΣM ) ∈ 2B32 def=
w(core,m) if core(α) ∧ ¬jisr IPI(α)
∧¬iint(core, I)
∧¬storeAapic (core, I)
∅ otherwise
where w(core,m) denotes the set of addresses written during instruction execution
and the predicate swap(core,m, I) denotes a read-modify-write instruction with
successful compare operation
w(u,m) =

{ea(u, I), . . . , ea(u, I) +32 (d(I)− 1)} if (store(I)
∨swap(u,m, I))
∅ otherwise
and
I = I(h.coreα.pid,m)) .
Definition 3.13 I
Safe MIPSP Step
A MIPSP step defined by input α and starting in configuration h is ownership-
safe according to the ownership setting pair o and o′ if it obeys the memory access
policy and maintains the ownership invariant.
safestep(h ∈ CM , α ∈ΣM , o ∈ O, o′ ∈ O) ∈ B def=
safeacc(α.pid, IOstep(h, α), rs(h, α), ws(h, α), o)
∧ safetransfer(α.pid, IOstep(h, α), o, o′)
where the functions rs(h, α) and ws(h, α) denote the set of addresses which
are read and respectively written by α.
rs(h, α) = reads(h.coreα.pid, h.m, α)
ws(h, α) = writes(h.coreα.pid, h.m, α)
The predicate IOstep(h, α) is defined in the next section and denotes whether
the next step is an I/O step.
Definition 3.14 I
Safe Execution
Sequence
A MIPSP execution sequence defined by initial configuration h and input
sequence β is ownership-safe according to initial ownership o and final ownership
o′ if there exists an ownership sequence such that every step of the execution is
3.3. AUXILIARY DEFINITIONS FOR ORDER REDUCTION 49
ownership-safe. IPI steps are ownership-safe by definition.
safeseqMIPSP (h ∈ CM , β ∈ (ΣM )∗, o ∈ O, o′ ∈ O) ∈ B
def
=
β = ε ∧ o = o′ ∧ ownership-inv(o)
∨ ∃o0, ..., on ∈ O. o0 = o ∧ on = o′
∧ ∀i < n. ownership-inv(oi) ∧ (safestep(hi, βi, oi, oi+1) ∨ ipi(βi))
3.3 Auxiliary Definitions for Order Reduction
In order to talk formally about the COSMOS reordering theorem we introduce
some definitions. We use them in the formal instantiation of the COSMOS model
and in further sections, when we talk about MIPSP execution sequences.
3.3.1 I/O Steps
Next, we identify the I/O steps in a MIPSP execution.
The majority of the instructions executed by our machine are generated by
compiler from C-IL code. In C-IL shared variables have ”volatile” qualifier, thus
the compiler identifies accesses to shared data in the compiled code. We denote
the set of addresses of generated instructions with shared data access by Aio. In
addition we have to identify the I/O steps which are not executing instructions
but still access some shared component. Depending on the processors mode we
distinguish two kinds of I/O steps. For the execution in system mode we define
the set of hypervisor I/O steps.
J Definition 3.15
Hypervisor I/O
Steps
The next step is hypervisor I/O step if it is JISR or eret due to IPI, or if it
accesses shared memory.
IOstepHV (c ∈ CCORE , α ∈ ΣM , I ∈ B32) ∈ B def=
jisr IPI(α)
∨ (core(α) ∧ eretIPI (c, I))
∨ (core(α) ∧ ¬iint(c, I) ∧ (c.pc ∈ Aio))
In Definition 3.15 we implicitly assume the following software condition and
consider therefore only shared accesses originating from the C-IL code.
Software Condition 3 (Hypervisor Assembler Portions) All memory accesses
in the assembler portions of the hypervisor implementation are local steps.
J Definition 3.16
Guest I/O Steps
The set of guest I/O steps is defined by the predicate IOstepG and it consists
of all steps executed in guest mode.
IOstepG(α ∈ ΣM ) ∈ B def= guest(α)
50 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
In our model guest steps does not access any shared component. Still in
possible extensions of MIPSP it may happen that they do, e.g. if MMU steps are
accessing shared memory. Since we do not want to restrict the applicability of
our theory, we chose the weakest possible condition on guest interleaving points
and basically forbid reordering of guest steps.
Definition 3.17 I
I/O Steps
The overall set of I/O steps consists of the hypervisor I/O steps, the guest
I/O steps and the IPI steps.
IOstep(c ∈ CCORE , α ∈ ΣM , I ∈ B32) ∈ B def=
IOstepHV (c, α, I) ∨ IOstepG(α) ∨ ipi(α)
Definition 3.18 I
Predicate
Overloading
For convenience when we consider executions of the multiprocessor machine
rather than a single core we use the following (overloaded) definitions of I/O steps.
IOstepHV (h ∈ CM , α ∈ ΣM ) ∈ B def=
core(α) ∧ IOstepHV (h.coreα.pid, α, I(h.coreα.pid, h.m))
IOstep(h ∈ CM , α ∈ ΣM ) ∈ B def= IOstepHV (h, α) ∨ IOstepG(α) ∨ ipi(α)
Next in order to shorten notation we overload the predicate eretIPI from
Definition 2.36.
Definition 3.19 I
External eret
Shorthand
The predicate eretIPI denotes whether the next step is a core step of a given
processor i that executes an eret instruction which ends an ISR of an IPI.
eretIPI (h ∈ CM , α ∈ ΣM , i ∈ Pid) def=
core(α) ∧ α.pid = i ∧ eret(I(h.coreα.pid , h.m)) ∧ h.coreα.pid .eca[1]
3.3.2 Interleaving points
The set of interleaving points is defined for a given program and compiler. The
compiler identifies consistency points in the C-IL execution, points in which com-
piler consistency holds between the C-IL machine and the MIPSP machine exe-
cuting the compiled code. Knowing the step that starts in a consistent C-IL state,
the compiler can determine the instruction that is executed in the corresponding
MIPSP state. Furthermore our choice of interleaving points relates to a condition
of the COSMOS reordering theorem. It is required that between every two I/O
steps of a given execution there exists at least one interleaving point. Similar to
the I/O steps, interleaving points are defined depending on the processors mode.
Most of hypervisor interleaving points originate from the compiled program. By
3.3. AUXILIARY DEFINITIONS FOR ORDER REDUCTION 51
IP
IO L L L L LIOProcessor i
JISR ERET
L LIO IOIO
SHARED
ACCESS
... ...
SHARED
ACCESS
SHARED
ACCESS
Figure 3.5: Hypervisor execution sequence with IPI handling, where the ISR takes
place right after a hypervisor I/O step.
Acp we denote a set of instruction addresses provided by the compiler. Whenever
the program counter in a MIPSP processor points to an instruction whose address
is in Acp , compiler consistency has to hold between the MIPSP machine and the
corresponding C-IL configuration. Here we consider Acp from the point of view
of the MIPSP execution and the required minimal set of interleaving points, but
compilers are free to define more interleaving points than the ones listed bellow.
First we consider a hypervisor execution sequence.
In the absence of IPIs all hypervisor I/O steps in a MIPSP execution are in-
structions with shared memory accesses. Due to Software Condition 3 we know
that all shared memory accesses originate from C-IL steps processing a shared
access. Thus, in order to satisfy the IOIP condition in the absence of IPIs, it is
sufficient to have a requirement on the compiler to place an interleaving point
between every two I/O steps. When we include IPIs the situation changes signif-
icantly due to the following facts:
• JISR and eret are I/O steps.
• IPIs can occur on the boundary of any two instructions of the program.
These facts imply that JISR may happen immediately after a shared access I/O
step (Figure 3.5) and also that eret may be followed by a shared access I/O step
(Figure 3.6).
If we consider Figure 3.5 and take into the account that the compiler can not
guarantee that JISR occurs only at interleaving points, the only way to satisfy
the IOIP condition is to have an interleaving point right after every shared access
I/O step. Thus Acp contains the addresses of all instructions that follow the
instructions defined by Aio. To guarantee this the compiler have to enforce certain
programing discipline and exclude optimizations in its code generation algorithm,
which may break the requirement, that every shared access ends in a consistent
state.
52 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
IP
L IOL L L L L LIOProcessor i
JISR ERET
IO IO
IP
IO
SHARED
ACCESS
... ...
SHARED
ACCESS
SHARED
ACCESS
Figure 3.6: Hypervisor execution sequence with IPI handling, where the ISR takes
place right before a hypervisor I/O step.
In Figure 3.6 we depict the fact that a step executing eret due to an IPI
can be followed by a shared access I/O step. Since we already require from
the compiler to guarantee that shared access I/O steps end in a consistent state
and C-IL statements with shared accesses can be translated into several MIPSP
instructions we can not define that shared access I/O steps start in a consistent
state. Therefore, in order to satisfy the COSMOS condition, we have to define
the state immediately after the eret step as an interleaving point.
A JISR step ends in a state, which is obviously not consistent. We have
to find a subsequent state in the execution, before the next I/O step, which
we define as an interleaving point. In Chapter 5 we assume that the compiler
will set an interleaving point at the beginning of every C-IL function. We know
that our service routine contains the execution of a handler implemented in C-IL.
Furthermore, due to Software Condition 3 we know that the first I/O step after
JISR can only originate from the C-IL part of the handler. Thus the required
interleaving point after JISR is in our case the one at the beginning of the C-IL
IPI handler, which is also assumed to be defined by the set Acp .
In Figure 3.7 we depict an execution that contains all possible hypervisor I/O
steps. In Chapter 6 we prove that every interleaving point defined by us is a state
that satisfies compiler consistency
Definition 3.20 I
Hypervisor
interleaving points
In an execution sequence defined by the initial configuration h and the input
sequence β the configuration hk is a hypervisor interleaving point if the next step
executes an instruction from Acp or if hk is the first configuration after IPI service
routine.
hypIP(h ∈ CM , β ∈ (ΣM )∗, k ∈ [0 : |β| − 1]) ∈ B def=
(core(βk) ∧ hk.pcβk.pid ∈ Acp)
∨ (core(βk−1) ∧ eretIPI (hk−1.coreβk−1.pid, I))
where
I = I(hk−1.coreβk−1.pid, h
k−1.m)
3.3. AUXILIARY DEFINITIONS FOR ORDER REDUCTION 53
IP IP IP IP
L IOL L L L L LIOProcessor i
JISR ERET
L LIO IO
IP
IO
SHARED
ACCESS
... ...
SHARED
ACCESS
SHARED
ACCESS
Figure 3.7: Hypervisor execution sequence with IPI handling and program thread
and handler I/O steps.
1 1
GIP GIP
HV I/OProcessor i G I/O VMEXIT
0
HVIP
GUEST MODE 
Processor i
...
0
HVIP
...
Figure 3.8: Processor execution sequence with guest steps followed by a hypervisor
I/O step.
Since every guest step is an I/O step we need to define an interleaving point
between consecutive guest steps.
J Definition 3.21
Guest Interleaving
Points
Every configuration in guest mode is a guest interleaving point.
guestIP(h ∈ CM , β ∈ (ΣM )∗, k ∈ [0 : |β| − 1]) ∈ B def= IOstepG(βk)
Additionally we need one more interleaving point at the end of every guest ex-
ecution. According to the definition of guest I/O steps (Definition 3.16), VMEXIT
is an I/O step. Since there could be a hypervisor I/O step after VMEXIT, we
set an interleaving point after VMEXIT (Figure 3.8). From the program point of
view the first step after the context switch is defined by the instruction following
54 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
VMRUN. VMRUN is implemented by a compiler intrinsic on C-IL level. Thus this
interleaving point is also identified by the compiler and included in Acp .
Definition 3.22 I
Interleaving
Points
The overall set of interleaving points consists of
• all hypervisor interleaving points,
• all guest interleaving points,
• all configurations after an IPI step and
• all configurations in which some processor does its first step.
IP(h ∈ CM , β ∈ (ΣM )∗, k ∈ [0 : |β| − 1]) ∈ B def=
ipi(βk−1) ∨ guestIP(h, β, k) ∨ hypIP(h, β, k)
∨ (∃i ∈ Pid . k = min{j | i = βj .pid})
3.3.3 Execution Sequences
Definition 3.23 I
Local Steps
Given a step sequence β of length n the function coresteps returns the set of
indexes of steps of processor i contained in β.
coresteps(i ∈ Pid , β ∈ (ΣM )∗) ∈ 2N def=
∅ if β = ε
coresteps(i, β[0 : n− 2]) ∪ {n− 1} if β 6= ε ∧ βn−1.pid = i
coresteps(i, β[0 : n− 2]) if β 6= ε ∧ βn−1.pid 6= i
Definition 3.24 I
Local Steps
Subsequence
Given a step sequence β of length n the function coreseq returns the subse-
quence of steps of processor i contained in β.
coreseq(i ∈ Pid , β ∈ (ΣM )∗) ∈ (ΣM )∗ def=
ε if β = ε
coreseq(i, β[0 : n− 2]) ◦ βn−1 if β 6= ε ∧ βn−1.pid = i
coreseq(i, β[0 : n− 2]) if β 6= ε ∧ βn−1.pid 6= i
Definition 3.25 I
I/O Steps
Subsequence
Given a step sequence β of length n and an initial configuration h the function
IOsteps returns subsequence of I/O steps contained in β.
IOsteps(h ∈ CM , β ∈ (ΣM )∗) ∈ (ΣM )∗ def=
ε if β = ε
IOsteps(h, β[0 : n− 2]) ◦ βn−1 if β 6= ε ∧ IOstep(hn−1, βn−1)
IOsteps(h, β[0 : n− 2]) if β 6= ε ∧ ¬IOstep(hn−1, βn−1)
3.4. INSTANTIATION OF COSMOS WITH MIPSP 55
J Definition 3.26
Equivalent
Execution
Sequence
We consider two step input sequences β, ω ∈ (ΣM )∗ equivalent if they contain
the same number of steps, the local order of steps of every processor is the same
and the global order the I/O steps is the same.
β
o
= ω
def
= |β| = |ω|
∧ ∀i ∈ Pid . coreseq(i, β) = coreseq(i, ω)
∧ ∀h ∈ CM . IOsteps(h, β) = IOsteps(h, ω)
J Definition 3.27
IOIP Condition
The IOIP predicate denotes that in a given execution sequence
• the first step of every processor starts at an interleaving point and
• at least one interleaving point exists between every two I/O steps of every
processor.
IOIP (h ∈ CM , β ∈ (ΣM )∗) ∈ B def= ∀pid ∈ Pid , k ∈ [0 : |β| − 1].
(k = min{i | i ∈ coresteps(pid, β)} =⇒ IP(h, β, k))
∧ ∀i, j ∈ coresteps(pid, β). IOstep(hi, βi) ∧ IOstep(hj , βj) ∧ i < j
=⇒ ∃l ∈ (coresteps(pid, β) ∩ [i+ 1 : j]). IP(h, β, l)
J Definition 3.28
IP Schedule
The predicate IPsched denotes that in a given execution sequences, steps
of different units are interleaved only at interleaving points. With other words,
every two consecutive steps either belong to the same unit or are separated by an
interleaving point.
IPsched(h ∈ CM , β ∈ (ΣM )∗) ∈ B def=
1 if β = ε
1 if |β| = 1
IPsched(h, β[0 : |β| − 2]) if |β| > 1 ∧ IP(h, β, |β| − 1)
IPsched(h, β[0 : |β| − 2])
∧βn−1.pid = βn−2.pid otherwise
3.4 Instantiation of COSMOS with MIPSP
In the following lines we show how the COSMOS model can be instantiated with
MIPSP . Christoph Baumann defines in [Bau14] an instantiation interface for
COSMOS and presents an instantiation with a simple MIPS model without APIC.
As we show later, the formal instantiation of COSMOS with an APIC-containing
model requires some smart technical refinements.
56 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
3.4.1 COSMOS Instantiation Interface
The COSMOS instantiation interface is defined by the signature of the model.
S
def
= [A,V,R, nu,U , E , reads, δ, IO, IP]
It consists of types for memory addresses (A) and values (V), a dedicated set
of read-only addresses (R), number of computational units nu, computational
unit configuration type U , set of external inputs E , function defining the set of
addresses read by the next step reads, transition function δ, functions defining the
set of I/O steps and interleaving points IO and IP respectively. We instantiate
COSMOS by instantiating the types and implementing the functions maintaining
their signatures. For the instantiated model we have to prove that the reads-set
depends only on the memory content of the read addresses. Furthermore, we
have to maintain the characteristics of the model. In COSMOS the only shared
resource can be the memory. Therefore in the instantiation we have to handle the
APIC as part of the memory. Furthermore, in COSMOS every step is performed
by some computational unit. Thus we need to add an artificial unit for performing
the IPI steps.
Definition 3.29 I
Instantiated
COSMOS
Machine
The configuration of a COSMOS machine instantiated with MIPSP is defined
by the record SMIPSP .
SMIPSP
def
= [A,V,R, nu,U , E , reads, δ, IO, IP]
• SMIPSP .A = (B32 \ Aapic) ∪ Pid
The set of memory addresses excludes the region mapped to the APICs but
includes the set of processor identifiers. The APIC of processor i can be
accessed over memory port i, where the memory distinguishes two different
addressing modes. Usual 32-bit vectors address memory bytes, whereas
natural number processor identifiers refer to the APIC of the corresponding
processor.
• SMIPSP .V = B8 ∪ CAPIC
The set of memory values contains byte values and APIC configurations.
• SMIPSP .R = AroMIPSP
We set the read-only memory to be defined by AroMIPSP .
• SMIPSP .nu = np + 1
We have np + 1 computational units. In addition to the np processors we
have a computational unit performing the IPI steps.
• SMIPSP .U = [core ∈ CCORE , tlb ∈ CTLB ] ∪ Cipi
The state of the first np units is defined by the core and the TLB of the
3.4. INSTANTIATION OF COSMOS WITH MIPSP 57
corresponding processor. The last unit executes only IPI steps and has an
uninterpreted configuration state Cipi.
In the following we use all functions defined in Chapter 2 with unit pa-
rameters instead of core parameters, e.g. we define the next instruction
by
I(u,m)
def
= I(u.core,m).
In the following definitions we use I as a shorthand for I(u,m).
• SMIPSP .E = ΣM
The set of external inputs is defined by the labels of our transition system.
• SMIPSP .reads defines the set of read addresses for every step. The ad-
dresses of the instruction being fetched f(u) and the addresses read during
instruction execution r(u,m) are defined as in Section 3.2.
In case of steps that access the APIC we have to add the APIC addresses
to the reads-set.
SMIPSP .reads(u,m, α) =
f(u) ∪ r(u,m) if core(α) ∧ ¬jisr IPI(α)
∧¬(iint(u, I) ∨ loadAapic (u, I))
f(u) ∪ {α.pid} if core(α) ∧ ¬jisr IPI(α) ∧ ¬iint(u, I)
∧(loadAapic (u, I) ∨ eretIPI (u, I))
f(u) if core(α) ∧ ¬jisr IPI(α) ∧ iint(u, I)
{α.pid} if jisr IPI(α)
{pid} if α = (ipi, pid)
∅ otherwise
• SMIPSP .δ defines the transition function for the units of the instantiated
COSMOS machine.
The set of addresses written w(u,m) during instruction execution of in-
struction I is defined as in Section 3.2. The overall set of addresses written
by the transition function is defined as follows.
W (u,m, α) =
w(u,m) if core(α) ∧ ¬jisr IPI(α) ∧ ¬iint(u, I)
∧¬eretIPI (u, I) ∧ ¬storeAapic (u, I)
{α.pid} if jisr IPI(α)
∨(core(α) ∧ ¬iint(u, I)
∧(storeAapic (u, I) ∨ eretIPI (u, I)))
{0, . . . ,np − 1} if ipi(α)
∅ otherwise
58 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
As described in [Bau14], SMIPSP .δ gets as an input the state of the unit
executing the step, a partial memory defined by the reads set of the step,
and an external input. The transition function defines the new state of the
unit and an output partial memory, which represents the updated part of
memory for the step.
(u′,m′) = SMIPSP .δ(u,m, α)
Since the transition function of the MIPSP machine is defined for a total
memory, we define a memory M which is a total function by filling in zeros
for the addresses not covered by m.
M(a) =
{
m(a) if m(a) 6= ⊥
08 otherwise
We define the transition function for the four types of possible inputs, i.e.
core steps, IPI steps, guest steps, and VMEXIT steps.
∗ (u′,m′) = SMIPSP .δ(u,m, (core, pid, ex))
The state u′ of the processing unit u is defined by the core and TLB
transition functions from previous chapter.
u′.core = δcore(u.core, eev, I, R)
u′.tlb =
{
δtlb(u.tlb,flush) if flush(I)
u.tlb otherwise
where
eev =
{
eev(m(pid)) if ex
0256 otherwise
I = M4(u.pc)
addr = ea(u.core, I)
R = msd(I)(m(pid),M)(addr)
The new memory state is defined by the writes, which do not go to
the APIC range, and by the changes of the APIC. We first define M ′
- the state of the total memory M after the execution of the step,
and apic′ - the new APIC state. Then we construct the output partial
memory m′ using M ′ and apic′.
3.4. INSTANTIATION OF COSMOS WITH MIPSP 59
M ′ =

δmem(M, (addr, data)) if store(I) ∧ addr /∈ Aapic
∧¬isJISR(u, eev, I)
M otherwise
apic′ =

δapic(m(pid), jisr) if jisr IPI(core, pid, ex)
δapic(m(pid), eret) if eretIPI (u, I)
δapic(m(pid), (apicoffset(addr), data)) if storeAapic (u, I)
∧¬isJISR(u, eev, I)
m(pid) otherwise
m′(a) =

M ′(a) if a ∈ w(u,m)
apic′ if a = pid
⊥ otherwise
where
data = sv(u.core, I)
∗ (u′,m′) = SMIPSP .δ(u,m, (ipi, pid))
IPI steps change only the APIC state in the memory.
u′ = u
m′(i) =

δapic(m(i), (mt, vector))[DS 7→ 0] if i = pid ∧ target(i)
m(i)[DS 7→ 0] if i = pid ∧ ¬target(i)
δapic(m(i), (mt, vector)) if i 6= pid ∧ target(i)
m(i) otherwise
where
mt = m(pid).ICR.MT
vector = m(pid).ICR.V EC
target(i) =m(pid).ICR.DSH = ALL
∨m(pid).ICR.DSH = ALL-BUT -SELF ∧ i 6= pid
∨m(pid).ICR.DSH = SELF ∧ i = pid
∨m(pid).ICR.DSH = ID ∧ h.APIC IDi = m(pid).ICR.DEST
60 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
∗ (u′,m′) = SMIPSP .δ(u,m, (guest, pid))
Guest steps change only the TLB of the current unit.
u′.tlb = δtlb(u.tlb, add)
∗ (u′,m′) = SMIPSP .δ(u,m, (vmexit, pid))
VMEXIT steps change only the mode of the current unit.
u′.core.mode[0] = 0
• SMIPSP .IO(u,m, α)
def
= IOstep(u.core, α, I(u,m))
The set of IO steps is defined in Definition 3.17.
• SMIPSP .IP The set of interleaving points is defined in Section 3.3.2.
Formally the signature of the COSMOS function differs from the signature of
the function from Definition 3.22. SMIPSP .IP has parameters defining only
the next step, whereas IP is defined on the execution sequence. SMIPSP .IP
is inconvenient for defining interleaving points based on a step that has
already been performed. One has to extend the model with a ghost history
variable, which tells whether the last step was an eret step due to an IPI
or an IPI step. We omit presently here this simple extension according
to our policy to keep the model simple and leave details out, which do not
contribute neither to better understanding nor to expressing some important
property of the proofs.
3.4.2 Instantiation Restriction for reads
The restriction for the COSMOS instantiation requires the reads-sets for the next
step defined on two different memory states to be the same if the two memories are
equal at the addresses of the reads-set. The set defined by SMIPSP .reads depends
on the configuration of the computational unit, the input parameter and in some
cases on the fetched instruction. Which instruction we fetch depends only on the
processor’s program counter and on the four consecutive memory bytes starting at
the address stored in the program counter. These are the only memory cells which
influence the output of SMIPSP .reads. Since in our instantiation the instruction
bytes are included in the reads-set, SMIPSP obviously satisfies the requirement.
3.4.3 Reordering Theorem
As a result of the instantiation we can now apply the COSMOS reordering theorem
to SMIPSP execution sequences. It says that if all IP schedule execution sequences
starting in a given configuration fulfill the IOIP condition and are ownership safe,
then we can deduce ownership safety for all execution sequences.
3.4. INSTANTIATION OF COSMOS WITH MIPSP 61
Since we can easily prove that every trace of the MIPSP can be simulated by
an execution of SMIPSP and vice versa the two models are isomorphic. Thus we
can claim the same properties for MIPSP execution sequences.
For MIPSP every possible trace fulfill the IOIP condition by our definition of
I/O steps and interleaving points. All I/O steps except JISR and guest I/O steps
end in an interleaving point.
• After JISR according to Software Condition 3 a processor executes only local
steps before the next interleaving point, where it starts executing compiled
code.
• Guest interleaving steps are followed by VMEXIT, which ends in an inter-
leaving point.
This allows us to simplify the MIPSP version of the COSMOS reorder theorem
in Definition 3.30. Additionally we expose a lemma from [Bau14], which says that
every sequence satisfying the IOIP condition can be reordered into an equivalent
IP schedule sequence.
J Definition 3.30
MIPSP Order
Reduction
Theorem
If all MIPSP IP schedule execution sequences starting in a given configuration
h are ownership safe, then we can deduce ownership safety for every execution
sequence β, and β can be reordered into an equuivalen IP schedule ω.
∀h, β, o.
(ownership-inv(o)
∧ ∀γ. ∃o′. (IPsched(h, γ) =⇒ safeseqMIPSP (h, γ, o, o′)))
=⇒ (∃o′′. safeseqMIPSP (h, β, o, o′′)
∧ ∃ω. (ω o= β ∧ IPsched(h, ω)))
As a consequence of the theorem we will consider from now on only IP sched-
ule execution sequences. In that to prove ownership safety of all executions it
is sufficient to prove ownership-safety for IP schedules. Furthermore, to prove
simulation of a C-IL programs we apply compiler consistency on IP schedules.
In the following figures we present examples for reordered execution sequences.
Figure 3.9 shows reordered uninterrupted hypervisor execution on processor i,
where interleaving happens only after shared accesses.
Figure 3.10 shows reordered guest execution on processor i, where all guest
steps and the VMEXIT step start in an interleaving point.
Figure 3.11 depicts that IPI steps happen from the point of view of processors
only at interleaving points. In the reordered execution sequence the IPI steps
may take place before local processor steps, which have been before them in the
original trace. Still since we have the external interrupt flag in the step labels the
JISR will happen at the original place regarding the local execution order of the
processors.
62 CHAPTER 3. REORDERING OF EXECUTION SEQUENCES
IP IP IPIP IP IPIP IP
L IOL L L L LIO L
/i
i
Figure 3.9: Reordered uninterrupted hypervisor execution.
111 1 0
IPIP IPIP IP IPIP
VMRUN HV I/O
/i
i
G I/O VMEXIT
1
IP
GUEST MODE 
Processor i
L
0
IP
...
...
0
Figure 3.10: Guest execution.
IP IP IPIP IP IPIP
IPI IPI
/i
i
IP
IPI
Figure 3.11: Execution of the IPI unit.
Chapter 4
Interrupt Thread
In this chapter we look into the details of reordered processor executions that
contain an interrupt service routine. The complete execution of an interrupt
service routine starts with JISR and ends with the execution of the eret instruction.
In Figure 4.1 we depict an execution sequence which contains interrupt handling
obtained after application of the COSMOS reordering theorem. The oval steps
belong to the service routine and the square steps belong to the interrupted
program. COSMOS provides a competitive technology for generic reordering but
it does not consider interrupts. Using COSMOS ownership alone, it is not possible
to express and prove that the handler execution does not mess up the program
data, since the execution of the interrupt routine has the same access rights as
the interrupted program.
As a matter of fact interrupt handlers run in different context than the pro-
gram. Thus what we considered a processor execution up to here, we split into
program thread and handler thread from now on, where the handler thread con-
sists of the interrupt service routine. The first part of the interrupt service routine
stores the context of the program thread and dispatches the interrupt to the cor-
responding handler. The second part is the execution of the particular handler.
The third and last part restores the program thread and returns the control to it.
After the service routine the program thread continues from the interrupted
place. We consider the IPI handling from the point of view of the interrupted
hypervisor. The effect of the TLB flushes is visible to the guest but not to the
hypervisor since we do not use address translation in system mode. In order to
express and prove such a property we need to guarantee data separation between
program thread and handler thread except for the shared data. Therefore we need
to refine the ownership model so that we distinguish between handler and program
executions.
In the previous chapter we defined the state after eret as an interleaving point.
So we expect compiler consistency to hold after the service routine. Now that we
consider the interrupt handler a different thread, this can only be guaranteed if
compiler consistency holds also for the configuration before JISR. This of course
63
64 CHAPTER 4. INTERRUPT THREAD
IP IP IPIP IP IPIP IP
L IOL L L L L LIO
/i
i
JISR ERET
L LIO IO
IPIP
Figure 4.1: Execution sequence of processor i that contains IPI service routine.
Oval steps belong to the ISR and the square steps belong to the interrupted
program thread.
can not be guaranteed in general. In MIPSP IPIs can interrupt the program
thread at an arbitrary point, e.g. in the middle of a program step, which requires
several MIPSP steps. Thus we need another reorder theorem which allows us to
reorder JISR and the service routine to the preceding interleaving point.
The context switch in the service routine is implemented in assembler and
contains several instructions. During the execution of these instructions the core
state of the interrupted program is partially in registers and partially in memory.
This makes it inconvenient to refer to it. Furthermore we want to state that the
saved context stays unchanged during the service routine. For that reason we
introduce a ghost component. The core state of the program thread is saved
atomically and secured during the service routine in this new ghost component.
We extend the MIPSP configuration and the MIPSP step function so that the
register context of the program thread is stored at JISR in a ghost sub-component
of the processor.
Definition 4.1 I
MIPSP
Extensions
We extend the processor configuration by a ghost core component. We call
this new ghost component old-core and refer to by oCore.
CCPU
def
= [core ∈ CCORE , tlb ∈ CTLB , apic ∈ CAPIC , oCore ∈ CCORE ]
The changes in the step function are limited to the core step. All original
components of the new configuration h′ = ∆(h, (core, pid, ex )) are defined as
before (Definition 2.44). Additionally in case of JISR we update oCore with the
current core state.
4.1. OWNERSHIP XT 65
System Memory
Read Only Memory
har d
CPUi CPUjCPUi+1
...
Ohi
Osi
Opi Ohi+1
Osi+1
Opi+1 Ohj
Osj
Opj
Figure 4.2: OwnershipXT
h′.oCorepid =
{
h.corepid if isJISR(h.corepid, eev, I)
h.oCorepid otherwise
In Section 4.1 we define a more fine grained ownership, which distinguishes
between program thread and interrupt thread steps. In Section 4.2.1 we list some
auxiliary definitions. In Section 4.2 we present a reorder theorem, aiming at a
model in which JISR happens only at interleaving points.
4.1 Ownership XT
One property we require of the interrupt handler is that it should not change the
memory owned by the program thread. In order to express that, we define a new
ownership policy. Additionally we show that the new ownership policy implies the
one from Chapter 3.
In Figure 4.2 the enhancement of the ownership model is depicted. The
changes concern only the private memory. The set of owned addresses of every
processor is separated in three sets representing the stack addresses, the program
thread private addresses and the interrupt thread private addresses. The set of
stack addresses is fixed and does not change during execution.
We prevent data races on the stack with the access policy based on the stack
pointer. During program thread execution we have only valid stack data for the
program thread and it resides between the stack base and the stack pointer (Figure
4.3). On JISR we save the stack pointer of the interrupted program.
During the service routine the data belonging to the interrupted program
resides between the stack base and the saved stack pointer. The handler stack
66 CHAPTER 4. INTERRUPT THREAD
Osi
Stack Base
rsp
Osi
Stack Base
oldrsp
rsp
Program
Thread
Region
Handler
Thread
Region
Program
Thread
Region
Low Memory
High Memory
Figure 4.3: Stack memory during program thread execution (left) and handler
thread execution (right). The stack is growing downwards.
data resides between the saved stack pointer of the program thread and the
current stack pointer. In other words, the saved stack pointer of the program
thread serves as a stack base address for the handler thread.
Definition 4.2 I
Dynamic
Ownership
Information XT
The dynamic state of the ownership is defined by a set of shared addresses
and three mappings of processor indexes
• to a program thread owns-set Op,
• to an interrupt thread owns-set Oh and
• to a set of stack addresses Os.
OXT def= [Op ∈ Pid → 2B32 , Oh ∈ Pid → 2B32 , Os ∈ Pid → 2B32 ,Ash ∈ 2B32 ]
We use the following shorthand notation to refer to the different sets of ad-
dresses assigned to given processor i in an ownership o ∈ OXT .
o.Opi
def
= o.Op[i]
o.Ohi
def
= o.Oh[i]
o.Osi
def
= o.Os[i]
With o.Oi we refer to the union of all addresses assigned to processor i.
o.Oi
def
= o.Opi ∪ o.Ohi ∪ o.Osi
4.1. OWNERSHIP XT 67
The overall set of private addresses is the union of all owned addresses.
o.Apr =
⋃
i∈Pid
o.Opi ∪ o.Ohi ∪ o.Osi
The set of private addresses not owned by processor i we denote depending
on its mode, i.e. program thread or handler thread mode, by:
O
h
i = Apr \Ohi
O
p
i = Apr \Opi
From the extended ownership we can build an ownership of the type defined
in the previous chapter.
J Definition 4.3
Join of extended
ownership
The function join generalizes the information from given extended ownership
ox ∈ OXT into ownership of type O.
join(ox ∈ OXT ) ∈ O def= [O, ox.Ash ]
where the first element of the result record O ∈ Pid → 2B32 maps processor
identifiers to the union of addresses owned by the corresponding processor in the
ownership ox.
O[i] = ox.Ohi ∪ ox.Opi ∪ ox.Osi
We call the ownership join(ox) the conjoint version of ox or simply conjoint ox.
4.1.1 Ownership Policy XT
The validity conditions of the extended ownership include the ones from the pre-
vious chapter and define in addition the properties on the new sets.
J Definition 4.4
Ownership
Invariant
The predicate ownership-invXT defines the invariant of the extended owner-
ship. It states that:
• the complete address space A is covered by the ownership,
• the read only memory is disjoint with the private and the shared memory,
• the owns-sets of different processors are disjoint,
• stack addresses are not shared,
• the set of stack addresses, the program thread owns-set and the interrupt
thread owns-set are disjoint.
68 CHAPTER 4. INTERRUPT THREAD
ownership-invXT (o ∈ OXT ) ∈ B def=
A = o.Apr ∪ o.Ash ∪ Aro
∧ o.Apr ∩ Aro = ∅
∧ o.Ash ∩ Aro = ∅
∧ ∀a, i, j. a ∈ o.Oi ∧ a ∈ o.Oj =⇒ i = j
∧ o.Ash ∩
⋃
i∈Pid
o.Osi = ∅
∧ ∀i. o.Ohi ∩ o.Opi = ∅ ∧ o.Ohi ∩ o.Osi = ∅ ∧ o.Osi ∩ o.Opi = ∅
Lemma 4.1 (Ownership Invariant Refinement) If a given extended ownership
o is valid then its conjoint version is also valid.
∀o ∈ OXT . ownership-invXT (o) =⇒ ownership-inv(join(o))
Proof The proof is trivial, since Definition 4.4 contains literally Definition 3.6.
4
The rules for ownership transfer in the extended ownership are similar to the
one from the previous chapter. The difference is that we need to distinguish which
thread is running in order to define which sets are allowed to change, since during
program thread execution the handler thread owns-set must stay unchanged and
vice versa. The following rules define the admissible transitions during interrupt
service routine on a processor i. The handler thread can:
1. non-/exclusively acquire shared unowned addresses to the handler owns-set,
2. exclusively acquire shared addresses which are already in the handler owns-
set,
3. release shared/exclusively owned addresses from the handler owns-set.
In case of program thread execution the rules are the same, they just are
related to the program thread owns-set.
In Figure 4.4 we depict the transfer rules.
Definition 4.5 I
Safe transfer
The predicate transferXT denotes the transfer rules depending on the pro-
cessor mode defined by the parameter ih. If ih is one we refer to handler thread
execution and if ih is zero we refer to program thread execution.
transferXT (i ∈ Pid , ih ∈ B, o ∈ OXT , o′ ∈ OXT , a) ∈ B def={
transfer(o.Ohi , o
′.Ohi , o.Ash \ o.O
h
i , o
′.Ash \ o′.Ohi , a) if ih
transfer(o.Opi , o
′.Opi , o.Ash \ o.O
p
i , o
′.Ash \ o′.Opi , a) otherwise
4.1. OWNERSHIP XT 69
System MemoryRead Only Memory
Shared Memory
OhiOpi
Osi
Figure 4.4: Ownership transfer XT.
The predicate safetransferXT denotes whether the extended ownership pair
o and o′ satisfies the validity for ownership transfer on steps of processor i, where
io is a flag indicating I/O steps and ih indicates handler thread execution.
safetransferXT (i ∈ Pid , io ∈ B, ih ∈ B, o ∈ OXT , o′ ∈ OXT ) ∈ B def=
∀a ∈ B32. transferXT (i, ih, o, o′, a) if io
∧∀j ∈ Pid . j 6= i =⇒ a ∈ o.Oj ⇐⇒ a ∈ o′.Oj
o = o′ otherwise
Lemma 4.2 (Safe Transfer Refinement) If the ownership transfer between the
extended ownership pair o and o′ is safe, then the ownership transfer between their
conjoint versions join(o) and join(o′) is also safe.
∀i ∈ Pid , io, ih ∈ B, o, o′ ∈ OXT .
safetransferXT (i, io, ih, o, o
′) =⇒ safetransfer(i, io, join(o), join(o′))
Proof The only thing that we have to prove is that transfer that satisfies transferXT
also satisfies the rules defined for the conjoint ownership. We make a case dis-
tinction on ih. After unfolding the definitions of safetransferXT , transferXT and
safetransfer we come to similar statements in both cases.
• ih = 1
transfer(o.Ohi , o
′.Ohi , o.Ash \ o.Ohi , o′.Ash \ o′.Ohi , a) =⇒
transfer(o.Oi, o
′.Oi, o.Ash \ o.O i, o′.Ash \ o′.O i, a)
70 CHAPTER 4. INTERRUPT THREAD
• ih = 0
transfer(o.Opi , o
′.Opi , o.A
sh \ o.Opi , o′.Ash \ o′.Opi , a) =⇒
transfer(o.Oi, o
′.Oi, o.Ash \ o.O i, o′.Ash \ o′.O i, a)
Both cases are quite similar and can be generalized to the following statement.
transfer(Os, O
′
s, shs, sh
′
s, a) ∧A(Os, O′s, shs, sh′s, O,O′, sh, sh′)
=⇒ transfer(O,O′, sh, sh′, a)
Where O and Os denote sets of owned addresses, sh and shs denote sets of
shared addresses, O′, O′s, sh′ and sh′s denote the corresponding sets after the
ownership transfer of an address a, and A is a predicate stating that the sets from
the left hand side of the implication are subsets of the sets from the right hand
side. In that generalization Os represents o.O
p
i and o.O
h
i , which are subsets of
o.Oi represented by O. Furthermore shs denotes o.Ash \ o.Ohi and o.Ash \ o.Opi ,
which are subsets of o.Ash \ o.O i represented by sh.
A(Os, O
′
s, shs, sh
′
s, O,O
′, sh, sh′) def= Os ⊆ O ∧ O′s ⊆ O′ ∧ shs ⊆ sh
∧ sh′s ⊆ sh′ ∧O \Os = O′ \O′s ∧ sh \ shs = sh′ \ sh′s
We introduce the following shorthands for better readability.
• X = O \Os = O′ \O′s
• Y = sh \ shs = sh′ \ sh′s
After unfolding the definition of transfer we get:
(a ∈ Os ∪ shs ⇐⇒ a ∈ O′s ∪ sh′s)
∧ (a /∈ shs ∧ a ∈ sh′s =⇒ a ∈ Os ∧ a /∈ O′s)
∧A(Os, O′s, shs, sh′s, O,O′, sh, sh′)
=⇒ (a ∈ O ∪ sh⇐⇒ a ∈ O′ ∪ sh′)
∧ (a /∈ sh ∧ a ∈ sh′ =⇒ a ∈ O ∧ a /∈ O′)
We split the conjunctions on both sides and prove them separately, which
gives us even stronger statement, since
(a =⇒ c) ∧ (b =⇒ d) =⇒ (a ∧ b =⇒ c ∧ d)
Case 1: (a ∈ Os∪shs ⇐⇒ a ∈ O′s∪sh′s)∧A =⇒ (a ∈ O∪sh⇐⇒ a ∈ O′∪sh′)
a ∈ O ∪ sh⇐⇒ a ∈ X ∪ Y ∪Os ∪ shs
⇐⇒ a ∈ X ∪ Y ∪O′s ∪ sh′s precondition
⇐⇒ a ∈ O′ ∪ sh′
4.1. OWNERSHIP XT 71
Case 2:
(a /∈ shs ∧ a ∈ sh′s =⇒ a ∈ Os ∧ a /∈ O′s) ∧A
=⇒ (a /∈ sh ∧ a ∈ sh′ =⇒ a ∈ O ∧ a /∈ O′)
a /∈ sh ∧ a ∈ sh′ =⇒ a /∈ sh ∧ a /∈ Y ∧ a ∈ sh′ Y ⊆ sh
=⇒ a /∈ sh ∧ a ∈ sh′s Y = sh′ \ sh′s
=⇒ a /∈ shs ∧ a ∈ sh′s shs ⊆ sh
=⇒ a ∈ Os ∧ a /∈ O′s precondition
=⇒ a ∈ Os ∧ a /∈ O′s ∧ a /∈ X X = O \Os
=⇒ a ∈ Os ∧ a /∈ O′ X = O′ \O′s
=⇒ a ∈ O ∧ a /∈ O′ Os ⊆ O
4
The memory access policy of the extended ownership is close to the one defined
in the previous chapter, where a processor i depending on I/O flag can access read
only memory, shared memory and its owns-set. For the extended ownership we
reduce the set of accessible addresses depending on the processor mode. Instead
of the complete owns-set the handler and the program thread running on processor
i are allowed to access only the handler owns-set and the program thread owns
set respectively. In addition both are allowed to access a portion of the stack.
J Definition 4.6
Memory Access
Policy
The predicate safeaccXT defines our policy for reading and writing memory
addresses by a processor i based on the extended ownership setting o. R and W
represent the sets of addresses to be read and/or written respectivelly. io and ih
are flags denoting the execution of an I/O step and a handler step respectively.
rsp is the value of the processor’s stack pointer.
72 CHAPTER 4. INTERRUPT THREAD
safeaccXT (i ∈ Pid , io ∈ B, ih ∈ B, rsp ∈ B32, R ∈ 2B
32
,W ∈ 2B32 , o ∈ OXT ) def=
(R ⊆ o.Ohi ∪ o.Ash ∪ o.Aro ∪ stack(o.Osi , ih, rsp)) if ih ∧ io
∧(W ⊆ o.Ohi ∪ (o.Ash \ o.O
h
i ) ∪ stack(o.Osi , ih, rsp))
(R ⊆ o.Ohi ∪ o.Aro ∪ stack(o.Osi , ih, rsp)) if ih ∧ ¬io
∧(W ⊆ o.Ohi \ o.Ash ∪ stack(o.Osi , ih, rsp))
(R ⊆ o.Opi ∪ o.Ash ∪ o.Aro ∪ stack(o.Osi , ih, rsp)) if ¬ih ∧ io
∧(W ⊆ o.Opi ∪ (o.Ash \ o.O
p
i ) ∪ stack(o.Osi , ih, rsp))
(R ⊆ o.Opi ∪ o.Aro ∪ stack(o.Osi , ih, rsp)) if ¬ih ∧ ¬io
∧(W ⊆ o.Opi \ o.Ash ∪ stack(o.Osi , ih, rsp))
where the function stack defines the stack region of the program thread to
contain all addresses above the stack pointer parameter and the stack region for
the handler to contain all addresses under the stack pointer parameter1.
stack(S ∈ 2B32 , ih ∈ B, rsp ∈ B32) ∈ 2B32 def=
{
{a | a ∈ S ∧ a ≤ rsp} if ih
{a | a ∈ S ∧ a > rsp} if ¬ih
Lemma 4.3 (Safe access refinement) If a memory access is ownership safe ac-
cording to an extended ownership o, then it is also ownership safe according to
the conjoint ownership join(o).
∀i ∈ Pid , io ∈ B, ih ∈ B, rsp ∈ B32, R,W ∈ 2B32 , o ∈ OXT .
safeaccXT (i, io, ih, rsp,R,W, o) =⇒ safeacc(i, io, R,W, join(o))
Proof The proof considers four cases ¬ih ∧ ¬io, ih ∧ ¬io, ¬ih ∧ io and ih ∧ io.
In all of them we show that the largest possible reads and writes sets that satisfy
safeaccXT are subsets of the ones defined for the conjoint ownership by safeacc.
4
1In later definitions we have to instantiate the rsp parameter of the stack function properly
either with the stack register of the core or with the stack register of the old-core.
4.1. OWNERSHIP XT 73
4.1.2 Safe Execution
J Definition 4.7
Safe MIPSP Step
A MIPSP step defined by input α and starting in configuration h is ownership-
safe according to extending ownership setting pair o and o′ if it obeys the memory
access policy and maintains the ownership invariant.
safestepXT (h ∈ CM , α ∈ ΣM , o ∈ OXT , o′ ∈ OXT ) def=
safeaccXT (α.pid, IOstep(h, α), isr, rsp, reads(h, α), writes(h, α), o)
∧ safetransferXT (α.pid, IOstep(h, α), isr, o, o′)
where functions reads(h, α) and writes(h, α), which denote the set of ad-
dresses which are read and respectively written by α, are defined as in Section
3.2. The predicate IOstep(h, α) denotes whether the next step is an I/O step.
isr denotes whether we are in an IPI service routine.
isr = h.isrα.pid[0] ∨ jisr IPI(α)
We note, that JISR steps count as the first ISR step according to the ownership
safety and transfer.
By rsp we denote the stack pointer used to split the stack.
rsp =
{
h.coreα.pid.rsp ¬isr
h.oCoreα.pid.rsp isr
J Definition 4.8
Safe Execution
Sequence
A MIPSP execution sequence defined by initial configuration h and input
sequence β is ownership-safe according to initial extended ownership o and final
extended ownership o′ if there exists an extended ownership sequence such that
every step of the execution is ownership-safe.
safeseqXT (h ∈ CM , β ∈ (ΣM )∗, o ∈ OXT , o′ ∈ OXT ) ∈ B def=
β = ε ∧ o = o′ ∧ ownership-invXT (o)
∨ ∃o0, ..., on ∈ OXT . o0 = o ∧ on = o′
∧ ∀i < n. ownership-invXT (oi) ∧ (safestepXT (hi, βi, oi, oi+1) ∨ ipi(βi))
Lemma 4.4 (Safe Step Sequence Refinement) If an execution sequence is own-
ership safe according to the extended ownership, then it is ownership safe accord-
ing to the conjoint ownership too.
∀h ∈ CM , β ∈ (ΣM )∗, ox, ox′ ∈ OXT .
safeseqXT (h, α, ox, ox
′) =⇒ safeseqMIPSP (h, α, join(ox), join(ox′))
Proof We unfold the definitions and apply the Lemmas 4.1, 4.2 and 4.3.
4
74 CHAPTER 4. INTERRUPT THREAD
IP IP IPIP IP IPIP IP
L IOL L L L L LIO
/i
i
JISR ERET
L LIO IO
IPIP
LIO LL L L L LIO L LIO IO
Figure 4.5: Reordering of execution sequence of processor i that contains IPI
service routine.
4.2 Reordering Proof
In this section we state and prove a reordering theorem in which local steps of the
program placed between JISR and the preceding interleaving point are reordered
after the service routine. As Figure 4.5 shows this implies that in the reordered
execution JISR happens at an interleaving point.
We want to prove that every MIPSP execution containing IPI handling can
be simulated by an execution in which the JISR step is reordered to the preceding
interleaving point. Furthermore we want to prove that if the reordered execution
is ownership-safe then also the original execution is ownership-safe. We do not
need to examine the complete executions but only the parts including ISR of IPIs.
These sub-sequences start at the last interleaving point before JISR and end at
the second interleaving point after eret, i.e. the first interleaving point of the
program thread after ISR.
We continue in Subsection 4.2.1 with the definitions of some predicates and
Lemmas, which we need for stating and proving our reordering theorem. In Sub-
section 4.2.2 we define a simulation relation between the configurations of the
original execution and the one with reordered ISR. In Subsection 4.2.3 we prove
that the simulation relations holds for every step of the original execution. In the
proof we rely on properties of the ISR. These are stated in the following chapter
as Software Conditions 5 and 6
4.2.1 Aux definitions
As we previously mentioned interrupt handling contains a context switch from pro-
gram thread to interrupt handler thread. The context of the interrupted process
is saved and restored to/from a process control block (PCB). In general a PCB is
a kernel data structure in which the processor registers are secured. It is used to
store the context of users and the kernel on process switched. To avoid ambiguity
we call our data structure an interrupt PCB or IPCB . While PCBs are allocated
4.2. REORDERING PROOF 75
as C variables IPCBs are not present on the C-IL level. They represent a set of
memory addresses, which are always assigned to the interrupt service routine.
Software Condition 4 (IPCB Ownership) IPCB addresses stay always in the
handler thread memory and are excluded from the ownership transfer.
Aipcb i ⊂ o.Ohi ,
where Aipcb i is defined in Definition 4.9.
Later defining the invariant between the ownership model on the different
levels in Definition 5.79 and in Software Condition 10 we rely on the IPCB char-
acteristics from above.
J Definition 4.9
IPCB
An IPCB offers space for all general and special purpose registers, and the
program counter. Thus the IPCB size is then 65 · 4 bytes. During hypervisor
boot a memory region Aipcb starting at IPCBSBA is allocated for IPCBs, one
IPCB per processor.
Aipcb
def
= [IPCBSBA : IPCBSBA +32 bin32(np · 65 · 4)]
where np ∈ N denotes the number of processor in the system.
The base address of the IPCB of processor i we denote by
ipcbba(i)
def
= IPCBSBA +32 bin32(i · 65 · 4)
The set of addresses occupied by the IPCB of processor i we denote by
Aipcb i
def
= [ipcbba(i) : ipcbba(i) +32 bin32(65 · 4)]
We refer to the IPCB of processor i by
ipcb(h, i)
def
= h.m65·4(ipcbba(i))
J Definition 4.10
IPCB Registers
We define the function regi to compute for a given IPCB address a the index
of the corresponding GPR registers.
regi(a ∈ B32) ∈ B5 def= (a[31 : 2]00−32 ipcbba(i))[6 : 2]
The function iai computes for a given register index r the corresponding
aligned address in the IPCB.
iai(r ∈ B5) ∈ B32 def= (025r00 +32 ipcbba(i))
76 CHAPTER 4. INTERRUPT THREAD
Definition 4.11 I
Local Step
The predicate Lstep denotes whether a given step α executed in a configuration
h is a local core step of processor i.
Lstepi(h ∈ CM , α ∈ ΣM ) ∈ B def= α = (core, i, 0) ∧ ¬IOstep(h, α)
Definition 4.12 I
Local Steps
Sequence
The predicate Lsteps denotes whether a given step sequence β executed in a
configuration h0 contains only local core steps of processor i.
Lsteps i(h
0 ∈ CM , β ∈ ΣM ∗) ∈ B def= ∀k < |β|. Lstepi(hk, βk)
Definition 4.13 I
Local Equality
(Program Thread)
The predicate eqL denotes whether in two configuration h and d the local
components of the program thread on processor i according the given ownership
are equal during program thread execution (i.e. clear ISR bit). The local compo-
nents, i.e. components accessed by local steps, are the core, the owned addresses
and the allocated portion of the stack.
eqLi (h ∈ CM , d ∈ CM , o ∈ OXT ) ∈ B def= h.corei.pc = d.corei.pc
∧ h.corei.gpr = d.corei.gpr
∧ ∀r ∈ {sr,mode}. d.corei.spr(r) = h.corei.spr(r)
∧ ∀a ∈ o.Opi ∪ stack(o.Osi , 0, h.corei.rsp). h.M(a) = d.M(a)
We note, that the only SPRs relevant for steps of the program thread, are the
mode and the status registers.
Before defining the local equality for the handler we want to consider an
important property of the ISR, i.e. that its execution does not depend on GPR
content belonging to the program thread. In other words the GPR content of the
configuration, in which JISR happens, does not influence the ISR execution. In
the next software condition we express that property.
Software Condition 5 (ISR GPR) ISR reads to GPR registers are either pre-
ceded by an ISR write to the same register or belong to the code saving the
context of the program thread. ISR writes to GPR registers can not precede the
saving of the destination register in the IPCB2.
We rely on this elementary ISR property in our simulation proof. It implies,
that the local components of the handler are a subset of what we consider to be the
program thread local components. For instance some GPR registers are excluded
2For that purpose the corresponding IPCB address should fit in 16 bits. Otherwise the ISR
save context should initially load ipcbba(i) into a dedicated register, whose content must be
saved temporary on the stack and then moved into the IPCB, and this software condition must
be refined.
4.2. REORDERING PROOF 77
corresponding to Software Condition 5. For integrating that property formally in
our definitions, we need parameters for both the list of registers saved by the ISR
in the IPCB(denoted by sv ∈ 2B5), and registers written by the ISR(denoted by
ch ∈ 2B5). Using these lists we are able to specify the components, which are
relevant to the handler steps execution.
J Definition 4.14
Local Equality ISR
Two configurations h and d are locally equal according the handler thread on
processor i and the given ownership if the following properties hold.
• The program counters are equal in both machines.
h.corei.pc = d.corei.pc
• The registers are changed in the same way in both machines.
gpreq(h ∈ CM , d ∈ CM , i ∈ Pid , ch ∈ 2B5 , sv ∈ 2B5) ∈ B def=
∀r ∈ [0 : 31]. d.corei.gpr(r) 6= d.oCorei.gpr(r)⇐⇒ r ∈ ch
∧ r ∈ ch =⇒ d.corei.gpr(r) = h.corei.gpr(r)
∧ r /∈ ch =⇒ h.corei.gpr(r) = h.oCorei.gpr(r)
∧ ch ⊆ sv
• The SPR stores the same values on both machines, except for epc, esr and
emode. Throughout the ISR epc, esr and emode store the address of the
interrupted instruction, the sr and mode values from the state before JISR
respectively.
spreq(h ∈ CM , d ∈ CM , i ∈ Pid) ∈ B def=
(∀r ∈ {sr, eca, edata,mode}. d.corei.spr(r) = h.corei.spr(r))
∧ d.corei.spr(esr) = d.oCorei.spr(sr)
∧ d.corei.spr(emode) = d.oCorei.spr(mode)
∧ d.corei.spr(epc) = d.oCorei.pc
∧ h.corei.spr(esr) = h.oCorei.spr(sr)
∧ h.corei.spr(emode) = h.oCorei.spr(mode)
∧ h.corei.spr(epc) = h.oCorei.pc
• The handler thread memory except the IPCB region stores the same values.
∀a ∈ o.Ohi \ Aipcb i. h.M(a) = d.M(a)
We exclude the IPCB addresses from the memory equality, having in mind
our intended simulation relation, and handle it separately. In the original
and in the reordered executions the ISR begins in different states. This
implies that in the reordered execution the values stored in the IPCBs will
be different than the ones stored in the original execution.
78 CHAPTER 4. INTERRUPT THREAD
d
Stack Base
h0.rsp
h
Stack Base
h.oCore.rsp
displ
h.rsp
d.rsp
h0.rspd.oCore.rsp
Low Memory
High Memory
Figure 4.6: Stack region allocation during ISR. On the left side is the stack in the
reordered execution. On the right side is the original stack wit some extra space
allocated by the postponed steps of the program thread.
• IPCB registers store the GPR content of the configuration before JISR, i.e.
IPCB registers are equal to the GPRs in the old-core.
ipcbeq(h ∈ CM , d ∈ CM , i ∈ Pid , sv ∈ 2B5) ∈ B def=
∀a ∈ Aipcb i. regi(a) ∈ sv =⇒
h.M4(a[31 : 2]00) = h.oCorei.gpr(regi(a))
∧ d.M4(a[31 : 2]00) = d.oCorei.gpr(regi(a))
• The stack memory of the handler thread stores the same values on both
machines. The handler stack might be allocated on different places in both
machines. We define this displacement, by the difference between the stack
pointers of the interrupted program (Figure 4.6).
displ = d.oCorei.rsp− h.oCorei.rsp
The predicate eqLH denotes handler thread local equality.
eqLHi (h ∈ CM , d ∈ CM , o ∈ OXT , ch ∈ 2B
5
, sv ∈ 2B5) ∈ B def=
h.corei.pc = d.corei.pc
∧ gpreq(h, d, i, ch, sv)
∧ spreq(h, d, i)
∧ ∀a ∈ o.Ohi \ Aipcb i. h.M(a) = d.M(a)
∧ ipcbeq(h, d, i, sv)
∧ ∀a ∈ stack(o.Osi , 1, h.oCorei.gpr(rsp)) ∧ a > h.corei.gpr(rsp).
h.M(a) = d.M(a+ displ)
4.2. REORDERING PROOF 79
J Definition 4.15
Environment
Equality
The predicate eqE denotes whether in two configuration the components of
other processor according the given ownership are equal. The components of
other processors, i.e. components accessed by local steps of other units are the
processors and the owned addresses.
eqEi (h, d, o)
def
= ∀j 6= i. h.cpuj = d.cpuj
∧ ∀a ∈ o.Oj . h.M(a) = d.M(a)
J Definition 4.16
Shared Equality
The predicate eqSh denotes whether in two configuration the shared com-
ponents according the given ownership are equal. The shared components, i.e.
components accessed by I/O steps of processor i, the APIC, the TLB and the
shared addresses.
eqShi (h, d, o)
def
= h.apici = d.apici
∧ h.tlbi = d.tlbi
∧ ∀a ∈ o.Ash . h.M(a) = d.M(a)
The following three lemmas express basic properties of local and I/O steps.
Similar lemmas are proven in [Bau14]. Since they do not talk about ISR steps,
their proofs which are similar to the proofs in [Bau14] and we skip them here.
Note that the lemmas require the ownership safety of the considered step. We
extend Lemma 4.5 by a condition, that the executed step is a program thread
step. Later we define a similar lemma for local steps of the ISR. Lemma 4.6
and Lemma 4.7 do not need a case distinction between a program thread and a
handler execution.
Lemma 4.5 (Local Program Thread Steps: Locality) If we execute the same
program thread local step on two configurations, then they are locally equal if and
only if they were locally equal before the step.
Lstepi(h, α) ∧ safestepXT (h, α, o, o) ∧ ¬isr(h.apicα.pid) =⇒
(eqLi (h, d, o) =⇒ eqLi (∆(h, α),∆(d, α), o))
Lemma 4.6 (Local Steps: Environment) If h′ is the configuration obtained af-
ter executing a local step of given processor i in configuration h then h′ is equal
to given configuration d according to shared components and components local
to other processors if and only if the same holds for h.
Lstepi(h, α) ∧ safestepXT (h, α, o, o) =⇒
(eqEi (h, d, o)⇐⇒ eqEi (∆(h, α), d, o))
∧ (eqShi (h, d, o)⇐⇒ eqShi (∆(h, α), d, o))
80 CHAPTER 4. INTERRUPT THREAD
Lemma 4.7 (I/O Steps) If we execute the same I/O step α on two configura-
tions h and d, then the resulting configurations are locally and shared equivalent
if and only if h and d are locally and shared equal. Furthermore the execution
of α on one of the machines preserves the equality of components local to other
processors.
α = (core, i, 0) ∧ IOstep(h, α) ∧ safestepXT (h, α, o, o′) =⇒
(eqLi (h, d, o) ∧ eqShi (h, d, o)
=⇒ eqLi (∆(h, α),∆(d, α), o′) ∧ eqShi (∆(h, α),∆(d, α), o′))
∧ (eqEi (h, d, o)⇐⇒ eqEi (∆(h, α),∆(d, α), o′))
Next we define the desired execution schedule.
Definition 4.17 I
schedJIP
Schedule
The predicate schedJIP denotes that every IPI service routine of a given
processor i starts at an interleaving point.
schedJIP (h ∈ MIPSP ,β ∈ (ΣM )∗, i ∈ Pid) ∈ B def=
∀k ∈ [0 : |β| − 1]. jisr IPI(βk, i) =⇒ IP(h, β, k)
Definition 4.18 I
Extended IP
Schedule Single
Processor
The predicate schedXT
i denotes that in a given execution sequences, steps
of different units are interleaved only at interleaving points and every IPI service
routine of a given processor i starts at an interleaving point.
schedXT
i(h ∈ CM , β ∈ (ΣM )∗, i ∈ Pid) ∈ B def= IPsched(h, β) ∧ schedJIP (h, β, i)
Definition 4.19 I
Extended IP
Schedule
The predicate schedXT denotes that in a given execution sequences, steps
of different units are interleaved only at interleaving points and every IPI service
routine starts at an interleaving point.
schedXT (h ∈ CM , β ∈ (ΣM )∗) ∈ B def= IPsched(h, β) ∧ ∀i ∈ Pid . schedJIP (h, β, i)
Definition 4.20 I
Program Thread
Interleaving Point
The predicate IPp denotes whether a given interleaving point belongs to a
program thread or to an IPI ISR. Interleaving points which belong to an IPI ISR
are defined by the APIC state, i.e. by the apic.isr bits, or by the preceding eret
step (Figure 4.7).
IPp(h0 ∈ CM , β ∈ (ΣM )∗, i ∈ Pid , n ∈ N) ∈ B def=
IP(h0, β, n) ∧ ¬isr(hn.apici) ∧ hn.modei[0] = 0 ∧ ¬eretIPI (hn−1, βn−1, i)
Definition 4.21 I
ISR Execution
Sequence
The predicate ISRseq denotes whether a given execution (sub-)sequence β of
length n is an IPI ISR of processor i.
4.2. REORDERING PROOF 81
IPP IP IP IP
L L L ... Li L L
0                            1             1               1             1             0              0                             1             1              1              1             0             0                      0ISR
IO
JISR
IO
JISR
L L IO
IP
ERET
IO
ERET
IO
IP
L
IP
L
IP
/i
...
IP IP IP IP IPPIP
Figure 4.7: Execution sequence with two IPI ISRs of processor i. All interleaving
points in which the corresponding ISR bit in the APIC is set and all interleaving
points in which an eret step ends belong to the ISRs.
IPP IP IPIP IP IPIP IPP
L L
/i
i
JISR ERET
L...L
ISRseq 
ISRseqXT 
IOIO
ISRblock 
Figure 4.8: ISR block of processor i defined by two subsequent interleaving points
of the program thread and an IPI ISR. The IPI ISR steps and the subsequent steps
of other processors or IPI will not be reordered.
ISRseq(h0 ∈ CM , β ∈ (ΣM )∗, i ∈ Pid) ∈ B def=
jisr IPI(β0, i) ∧ eretIPI (hn−1, βn−1, i)
∧ (¬∃k ∈ [1 : n− 2]. (jisr IPI(βk, i) ∨ eretIPI (hk, βk, i)))
Since eret is an IO step and ends in an interleaving point the subsequent local
step of processor i may be preceded by interleaved steps of other processors or IPI
steps. These interleaved steps can not be reordered, therefore we define another
82 CHAPTER 4. INTERRUPT THREAD
predicate to cover them together with the ISR sequence preceding them.
Definition 4.22 I
ISR Execution
Sequence XT
The predicate ISRseqXT denotes whether a given execution (sub-)sequence
β of length n is an IPI ISR of processor i followed by steps of other processors or
IPI steps.
ISRseqXT (h
0 ∈ CM , β ∈ (ΣM )∗, i ∈ Pid) ∈ B def=
∃m ∈ [0 : n− 1]. ISRseq(h0, β[0 : m], i)
∧ (∀k ∈ [m+ 1 : n− 1]. ¬(βk.pid = i ∧ core(βk)))
Definition 4.23 I
ISR Execution
Block
The predicate ISRblock denotes whether a given execution (sub-)sequence is
an ISR block of processor i. By an ISR block we refer to a sequence of steps
of a given processor between two subsequent program thread interleaving points
which contains at least one ISR of processor i. An ISR block contains several
interleaving points but only two (the first and the final one) are outside the ISR
(Figure 4.8).
These conditions on the interleaving points in an ISR imply that the ISRblock
predicate is fulfilled by sequences, which contain:
• only complete ISRs of processor i, i.e. ISRs that start and end within the
considered sub-sequence,
• only local steps of the same processor before the first JISR step,
• only local steps of the same processor between two IPI ISRs (if the block
contains several IPI ISRs),
• a sequence of local steps of i after the last eret ,
• and not more than one I/O step of i at the end of the sequence.
ISRblock(h0 ∈ CM , β ∈ (ΣM )∗, i ∈ Pid) ∈ B def=
IPsched(h0, β)
∧ IPp(h0, β, i, 0) ∧ IPp(h0, β, i, n)
∧ (¬∃l ∈ [1 : n− 1]. IPp(h0, β, i, l))
∧ ∃j, k ∈ [0 : n− 1]. ISRseq(hj , β[j : k], i)
∧ ∀j ∈ [0 : n− 1]. jisr IPI(βj , i) =⇒ ∃k ∈ [0 : n− 1]. ISRseq(hj , β[j : k], i, )
∧ ∀j ∈ [0 : n− 1]. eretIPI (hj , βj , i) =⇒ ∃k ∈ [0 : n− 1]. ISRseq(hk, β[k : j], i)
where n = |β|.
4.2. REORDERING PROOF 83
Lemma 4.8 (ISR Block Steps) Every ISR block of given processor i contains
only core steps of processor i in system mode (i.e. no guest or VMEXIT steps of
processor i) and not more than one program thread I/O step3.
∀β, h0, i.
IPsched(h0, β)
∧ ISRblock(h0, β, i)
=⇒ ∀k ∈ [0 : n− 1]. (βk /∈ {(vmexit, i), (guest, i)}
∧ (IOstep(hk, βk) ∧ βk.pid = i =⇒ IPp(h0, β, i, k)))
where n = |β|.
Proof The lemma is trivially proven by the IP schedule condition and the corre-
sponding distribution of I/O steps and IP points.
4
We aim at reordering all IPI ISR blocks in an original execution, without
changing the order of all other steps.
J Definition 4.24
Reorder ISR Block
The function reoISRb returns for a given IPI ISR block β reordered execution
sequence equivalent to β according the following rules:
• The output sequence contains the same steps as β.
• The order of ISR steps is unchanged.
• The order of program thread steps is unchanged.
• The order of interleaved steps is unchanged.
• Local steps preceding an ISR are reordered after the ISR (Figure 4.9). In
case of several ISRs, all local steps are reordered after the last ISR.
We have to define the function recursively in order to cover the cases with
several IPI ISRs. We identify the first ISR in the original sequence β and require
that the reordered sequence begins with it. Then we apply recursively the same
step on the sub-sequence achieved from β after removing the first ISR and the
subsequent interleaved steps. Note that an ISR block with several ISRs still
satisfies the ISRblock predicate after removing the steps of one ISR. After the
3JISR and eret step count as interrupt thread steps.
84 CHAPTER 4. INTERRUPT THREAD
IP IP IPIP IP IPIP IP
βk
/i
i
JISR ERET
β[m+1:n]...β[0:k-1]
...
JISR ERET
β[m+1:n]
βl
βk βl β[0:k-1]
βm
βm
Figure 4.9: ISR block reordering.
last ISR has been removed in the sub-sequences remain only program thread steps.
reoISRb(h0 ∈ CM , β ∈ (ΣM )∗, i ∈ Pid) ∈ (ΣM )∗ def=
β[k : m]reoISRb(h0, β[0, k − 1]β[m+ 1, n], i) if ISRblock(h0, β, i)
∧ISRseqXT (hk, β[k,m], i)
∧(k = 0
∨Lsteps i(h, β[0, k − 1]))
∧(m < n =⇒
core(βm+1) ∧ βm+1.pid = i)
β otherwise
where
n = |β| − 1
Definition 4.25 I
Select Processor
Steps
The function stepsi returns for a given execution sequence β the subsequence
of steps of a given processor i.
stepsi(β ∈ (ΣM )∗, i ∈ Pid) ∈ (ΣM )∗ def=
ε if β = ε
β0stepsi(β[1 : |β| − 1], i) if β0.pid = i
stepsi(β[1 : |β| − 1], i) otherwise
4.2. REORDERING PROOF 85
J Definition 4.26
Select I/O Steps
The function stepsI/O returns for a given execution sequence β and an initial
state h0 the subsequence of I/O steps.
stepsI/O(h
0 ∈ CM , β ∈ (ΣM )∗) ∈ (ΣM )∗ def=
ε if β = ε
β0stepsI/O(h
1, β[1 : |β| − 1]) if IOstep(h0 ∈ CM , , β0)
stepsI/O(h
1, β[1 : |β| − 1]) otherwise
Lemma 4.9 (Reorder ISR Block) If β is MIPSP execution sequence starting
in h0, that is an IP schedule and an ISR block of a given processor i and we apply
reoISRb to reorder β, then the resulting sequence satisfies the extend schedule
predicate for processor i. The order of I/O steps and the local order of steps for
other processors are preserved in the reordered schedule. Furthermore all inter-
leaved steps are reordered with the same number of positions, thus the reordering
does not insert new interleaving, i.e. interleaved blocks of other processors stay
unchanged.
∀β, h0, i.
IPsched(h0, β)
∧ ISRblock(h0, β, i)
=⇒ schedXT i(h0, reoISRb(h0, β, i))
∧ ∀j ∈ Pid . j 6= i =⇒ stepsi(β, j) = stepsi(reoISRb(h0, β, i), j)
∧ stepsI/O(h0, β) = stepsI/O(h0, reoISRb(h0, β, i))
∧ ∃k.∀l. βl.pid 6= i =⇒ βl = reoISRb(h0, β, i)[l − k]
Proof By unfolding the definition of reoISRb we can easily prove that every ISR
block within an IP schedule may be reordered into a step sequence which is as an
extended schedule, in which the order of interleaved and I/O steps is preserved.
All interleaved steps are reordered with k positions to the left, where k is the
number of local program thread steps of processor i, which are reordered to the
end of the ISR block.
4
The interrupt thread and the program thread are running on the same proces-
sor and we want to guarantee, that IPI ISR does not change local configuration
components related to the execution of the program thread. The memory, i.e.
stack and owned memory addresses, is protected by our extended memory own-
ership model. The core is saved and restored at the beginning and at the end of
an ISR respectively.
86 CHAPTER 4. INTERRUPT THREAD
Software Condition 6 (Valid ISR) The predicate ISRV denotes the validity for
ISR implementations. It states that the execution of an IPI ISR must be trans-
parent to the interrupted program thread4 and requires that the local state of the
program thread before and after an ISR are equivalent. For an execution sequence
h
β→ hn with |β| = n we define:
ISRV (h ∈ CM , β ∈ (ΣM )n, o ∈ (OXT )n, i ∈ Pid) def=

∃m < (n− 1). jisri(h, β,m) if ereti(h, β, n− 1)
∧(∀k ∈ [m : n− 1]. ok.Opi = om.Opi
∧stack(ok.Osi , 0, hk .oCorei .rsp) =
stack(om.Osi , 0, h
m .corei .rsp))
∧(¬∃k ∈ [m+ 1 : n− 2]. (jisri(h, β, k)
∨ereti(h, β, k)))
∧eqLi (hn, hm, om)
∧ISRV (h, β[0 : m− 1], o[0 : m− 1], i)
ISRV (h, β[0 : n− 2], o[0 : n− 2], i) if n > 1
∧¬ereti(h, β, n− 1)
1 otherwise
Note on ownership transfer.
Due to Definition 4.7 ownership transfer in an ISR block may happen on five
particular occasions.
• If the executing step is the last step of the block and it is an I/O step, then
we have transfer of addresses between the program thread owns-set and the
shared memory.
• If the executing step is JISR, eret , or some other ISR I/O step, then we
have transfer of addresses between the interrupt thread owns-set and the
shared memory.
• If the executing step is an I/O step and is not a step of the considered
processor, then we have transfer of addresses between the owns-set of the
stepping processor and the shared memory.
4The effect of the TLB flushes is visible to the guest but not to the hypervisor since we do
not use address translation in system mode.
4.2. REORDERING PROOF 87
h0 β0 JISR ...βm-1 ERETh hm hk hn
ISR
... ...
eqLi(h
n, hm, om)
Figure 4.10: The state after a valid ISR is locally equal with respect to the
corresponding processor i with the initial state of the ISR.
Software Condition 7 (JISR/ERET Ownership Transfer) We allow at the be-
gin and at the end of a handler execution ownership transfer according the fol-
lowing two rules.
• If the executing step is JISR, then the handler thread may acquire shared
addresses.
• If the executing step is an eret , then the handler thread may release ad-
dresses into the shared set.
4.2.2 Simulation Relation
We define a simulation relation between two machines. The first executes the
original ISR block sequence and the second executes the reordered ISR block. We
take into account how we express the desired reordering. We postpone all local
program thread steps in the reordered execution and process them at once at the
corresponding consistency point in the proper order. Thus we need to record the
steps which are executed in the original execution and postponed in the reordered
execution. The ISR steps in both executions happen simultaneously. In the proof
we need to refer to the initial configuration in the interleaving point before JISR.
Therefore we pass this configuration also to the coupling relation. Further we
refer to the machine executing steps in the original order by h and to the machine
88 CHAPTER 4. INTERRUPT THREAD
with the reordered execution by d.
Definition 4.27 I
Simulation
Relation
The predicate B defines the simulation relation, which couples the configura-
tions in the executions. It gets as parameters
• the initial configuration h0,
• the current configuration in the original execution h,
• the current configuration in the reordered execution d,
• the list of program thread steps executed in the original schedule and post-
poned in the reordered one γ,
• the current ownership o,
• the index of the processor i,
• a flag denoting whether the current state is a program thread interleaving
point,
• a list of registers saved by the ISR in the IPCB sv,
• and a list of registers changed by the ISR ch.
Two configurations h and d satisfy our simulation relation if the following
conditions hold.
• Environment and shared equality hold.
• During ISR h and d satisfy local ISR equality. Furthermore the state of the
interrupted program thread, defined by the program thread owned addresses,
the program stack addresses and the old-core, in d is equal to the initial
configuration and in h it is defined by the local program thread steps γ
executed in h0.
• During program thread execution the handler thread memory except the
IPCB region stores the same values. The relation of the program thread
state in program thread IP is different than the one in all other states. In
an IP h and d satisfy local equality and all postponed steps are executed.
Otherwise the program thread state in d is equal to the initial configuration
and in h its defined by the local program thread steps γ executed in h0.
4.2. REORDERING PROOF 89
d0
L
γ3
JISR
... L
γ4
ERETd d''L
L
γ1
L
γ2
I/O
h0
L
γ1
JISR
...L
γ2
ERET
ω 
h h' h''L
L
γ3
L
γ4
I/O h'''
d'''
B
Case 3
B
Case 2
B
Case 1
B
Case 1
B
Case 1
B
Case 2
B
Case 2
B
Case 3
B
Case 2
B
Case 2
  β 
ISR Program Thread 
Program Thread Program Thread ISR
Figure 4.11: Simulation relation.
B(h0,h, d ∈ CM , γ ∈ (ΣM )∗, o ∈ OXT , i ∈ Pid ,
ip ∈ B, ch ∈ 2B5 , sv ∈ 2B5) ∈ B def=
eqEi (h, d, o)
∧ eqShi (h, d, o)
∧ isr(h.apici) =⇒ (Case1)
eqLHi (h, d, o, ch, sv)
∧ h.oCorei = ∆|γ|(h0, γ).corei
∧ d.oCorei = h0.corei
∧ ∀a ∈ o.Opi ∪ stack(o.Osi , 0, h.oCorei.gpr(rsp)).
h.M(a) = ∆|γ|(h0, γ).M(a)
∧ (a > d.oCorei.gpr(rsp) =⇒ d.M(a) = h0.M(a))
∧ ¬isr(h.apici) =⇒
∀a ∈ o.Ohi \ Aipcb i. h.M(a) = d.M(a)
∧ (¬ip =⇒ eqLi (h,∆|γ|(h0, γ), o) (Case2)
∧ eqLi (h0, d, o))
∧ (ip =⇒ γ = ε (Case3)
∧ eqLi (h, d, o))
90 CHAPTER 4. INTERRUPT THREAD
4.2.3 Simulation Theorem
Theorem 4.10 (ISR Block Simulation Theorem) For every ISR block of any
processor i in a MIPSP IP schedule execution defined by initial configuration h
0
and input sequence β, if every extended schedule execution sequence starting from
h0 contains only valid ISRs and is safe with respect to some ownership sequence,
and dm is a configuration equivalent to the initial configuration of the ISR block
hm, there exist an input sequence ω and a step function5 s ∈ N→ N, such that
if we execute ω starting in dm , then:
• the execution sequence defined by dm and ω is an extended IP schedule for
processor i and
• the simulation relation B is maintained for every step l in the ISR block
between configurations hl and ds(l).
∀h0, β, dm, i, o,m, n.
IPsched(h0, β)
∧ ISRblock(hm, β[m : n], i)
∧ (∀ρ. ∃o′. schedXT (h0, ρ) =⇒ (ISRV (h0, ρ) ∧ safeseqXT (h0, ρ, o, o′)))
∧B(hm, hm, dm, ε, o, i, 1, ∅, ∅)
=⇒ ∃ω, s. schedXT i(dm, ω)
∧ ∀l ∈ [m : n+ 1]. ∃γ, o′, ch, sv. B(hm, hl, ds(l), γ, o′, i, ipl, ch, sv)
where
ipl = IPp(h0, β, i, l)
We note that by the definition of ISR block only the first and the last configurations
are an interleaving points of the program thread.
Proof First we instantiate ω with the reordered version of the ISR block β[m : n]
ω = reoISRb(hm, β[m : n], i)
and apply Lemma 4.9 to conclude that this instantiation results in an extended
IP schedule execution for processor i.
As a second step we prove by induction over l the simulation relation for every
state hl and the corresponding state ds(l).
Induction base l = m
The simulation relation for the base case
B(hm, hm, ds(m), γm, om, i, ipm, chmsvm)
5A step function is a monotonically increasing function defined on a subset of integers.
4.2. REORDERING PROOF 91
is trivially proven by the hypothesis, when we set s(m) = m and instantiate the
parameters as follows.
γm = ε
om = o
ipm = 1
chm = ∅
svm = ∅
Induction step l→ l + 1
In the induction step we make a case split on βl.
The hypothesis of our theorem does not depend on our induction parameter.
Thus for proving the claim of the theorem for the state after executing βl (this we
call induction claim) we can rely on the theorem’s hypothesis and on the theorem’s
claim for l( this we call induction hypothesis), i.e. the simulation relation before
executing step βl of the original execution sequence.
∀h0, β, dm, i, o,m, n.
IPsched(h0, β)
∧ ISRblock(hm, β[m : n], i)
∧B(hm, hm, dm, ε, o, i, 1, ∅, ∅)
∧ ∀ρ. ∃o′. schedXT (h0, ρ) =⇒ ISRV (h0, ρ) ∧ safeseqXT (h0, ρ, o, o′)
∧ ∃γl, ol, chl, svl. B(hm, hl, ds(l), γl, ol, i, ipl, chl, svl)
=⇒ ∃γl+1, ol+1, chl+1, svl+1. B(hm, hl+1, ds(l+1), γl+1, ol+1, i, ipl+1, chl+1, svl+1)
From Lemma 4.8 we can conclude the following properties for βl.
• βl can not be a guest step or a VMEXIT step of processor i. Which means
that the steps of processor i in β[m : n] are only core steps in system mode.
• During ISR we may interleave with IPI steps or steps of other processors
(guest and system mode).
• Not more than one program thread step6 may be an I/O step.
Due to the structure of our simulation relation we have to distinguish on two
parameters in the configurations in the induction hypothesis and in the induction
claim.
6JISR and eret step count as interrupt thread steps.
92 CHAPTER 4. INTERRUPT THREAD
• Are the states hl and hl+1 an ISR or program thread states, i.e. is the ISR
bit set?
• Are the states hl and hl+1 a program thread IP?
Based on the ISR state we define five categories of steps.
• Program thread steps: start and end in a state with a clear ISR bit.
• JISR step: starts in a state with a clear ISR bit and ends in a state with a
set ISR bit.
• ISR steps: start and end in a state with a set ISR bit.
• eret step: starts in a state with a set ISR bit and ends in a state with a
clear ISR bit.
• Other steps, i.e. IPI steps or steps of other processors.
According to the definition of the ISR block we know that we have a program
thread IP only at the beginning and at the end of an ISR block. This basically
means that we have an additional case distinction on the state before a JISR, i.e.
a JISR step may start in both an IP state and a ”normal” state, and on the state
before and after a program thread step. may:
• The first program thread step in the block (i.e. if the block starts with a
local program thread step, which is followed by more program thread steps
or JISR) starts in a program thread IP and ends in a ”normal” state.
• The last program thread step in the block starts in a ”normal” state and
ends in a program thread IP.
• All other program thread steps neither start nor end in a program thread
IP.
Based on the above observations we define eight proof cases.
Case 1: βl is the first program thread step in the block.
l = m
s(l) = m
Since we assume βl to be a program thread step, we can deduce
¬(isr(hl.apici) ∨ isr(hl+1.apici)).
4.2. REORDERING PROOF 93
The simulation relation in the induction hypothesis is defined by:
B(hm, hm, dm, ε, ol, i, 1, ∅, ∅) =
eqEi (h
m, dm, ol)
∧ eqShi (hm, dm, ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hm.M(a) = dm.M(a)
∧ eqLi (hm, dm, ol)
The execution of βl in h
l changes only local components of processor i, i.e.
the core, the local program thread memory and the allocated portion of the stack.
On d we do not execute any step. βl is attached to the list of postponed steps. All
other simulation relation parameters stay the same as in the induction hypothesis.
s(l + 1) = s(l) = m
hl+1 = ∆(hl, (core, i, 0))
ds(l+1) = dm
γl+1 = βl
ol+1 = ol
ipl+1 = 0
chl+1 = ∅
svl+1 = ∅
The intended simulation relation after βl is :
B(hm, hl+1, ds(l+1), γl+1, ol+1, i, ipl+1, chl+1, svl+1) =
eqEi (h
l+1, ds(l+1), ol+1)
∧ eqShi (hl+1, ds(l+1), ol+1)
∧ ∀a ∈ ol+1.Ohi \ Aipcb i. hl+1.M(a) = ds(l+1).M(a)
∧ eqLi (hl+1,∆|γ
l+1|(hm, γl+1), ol+1)
∧ eqLi (hm, ds(l+1), ol+1)
94 CHAPTER 4. INTERRUPT THREAD
after instantiation of the parameters
B(hm, hm+1, dm, βl, o
l, i, 0, ∅, ∅) =
eqEi (h
m+1, dm, ol)
∧ eqShi (hm+1, dm, ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hm+1.M(a) = dm.M(a)
∧ eqLi (hm+1,∆(hm, βl), ol)
∧ eqLi (hm, dm, ol)
We can apply Lemma 4.6 which proves the environment and the shared equal-
ity between hm+1 and dm(the first two conjuncts). The third one is trivially proven
by the hypothesis, since the ownership safety guarantees that addresses owned by
the interrupt handler and their memory content are unchanged by the program
thread. The local equality eqLi (h
m+1,∆(hm, βl), o
l) is trivial, since hm+1 is the re-
sulting configuration after executing βl in h
m. The local equality eqLi (h
m, dm, ol)
is contained in the hypothesis.
Case 2: βl is the last program thread step in the block.
l = n
Since we assume βl to be a program thread step, we can deduce
¬(isr(hl.apici) ∨ isr(hl+1.apici)).
The simulation relation in the induction hypothesis is defined by:
B(hm, hl, ds(l), γl, ol, i, 0, chl, svl) =
eqEi (h
l, ds(l), ol)
∧ eqShi (hl, ds(l), ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hl.M(a) = ds(l).M(a)
∧ eqLi (hl,∆|γ
l|(hm, γl, ), ol)
∧ eqLi (hm, ds(l), ol)
hl+1 is defined by the execution of βl in h
l and is an IP. On d we execute the
list of postponed steps γl and βl to define d
s(l+1).
s(l + 1) = n+ 1
hl+1 = ∆(hl, (core, i, 0))
ds(l+1) = ∆|γ
l|+1(ds(l), γlβl)
4.2. REORDERING PROOF 95
We split the execution of γlβn and define an intermediate configuration
d′ = ∆|γ
l|(ds(l), γl)
after executing γl in ds(l).
If we look at the last conjunct of the simulation relation from the induction
hypothesis
eqLi (h
m, ds(l), ol)
and apply (|γl| times) Lemma 4.5 we get
eqLi (∆
|γl|(hm, γl),∆|γ
l|(ds(l), γl), ol)
which in combination with the second last conjunct of the simulation relation
from the induction hypothesis
eqLi (h
l,∆|γ
l|(hm, γl), ol)
implies
eqLi (h
l,∆|γ
l|(ds(l), γl), ol).
Hence d′ is locally consistent with hl.
Additionally from our induction hypothesis we know, that hl and ds(l) are
consistent according the shared components and the environment.
eqEi (h
l, ds(l), ol) ∧ eqShi (hl, ds(l), ol)
This consistency is maintained by the execution of local steps(Lemma 4.6).
eqEi (h
l,∆|γ
l|(ds(l), γl), ol) ∧ eqShi (hl,∆|γ
l|(ds(l), γl), ol)
Hence d′ is consistent according the shared components and the environment with
hl.
Furthermore the execution of program thread steps does not change addresses
owned by the interrupt handler.
∀a ∈ ol.Ohi \ Aipcb i. hl.M(a) = d′.M(a)
Now let us consider the execution of βl on both machines. Since βl might
be an I/O step, we need a case distinction. Most of the instantiation parameters
for the simulation relation have the same values in both cases, i.e. the execution
ends in an IP, all postponed instructions are executed, the lists of changed and
stored registers do not change.
γl+1 = ε
ipl+1 = 1
chl+1 = chl
svl+1 = svl
96 CHAPTER 4. INTERRUPT THREAD
The intended simulation relation after βl is :
B(hm, hl+1, ds(l+1), ε, ol+1, i, 1, chl, svl) =
eqEi (h
l+1, ds(l+1), ol+1)
∧ eqShi (hl+1, ds(l+1), ol+1)
∧ ∀a ∈ ol+1.Ohi \ Aipcb i. hl+1.M(a) = ds(l+1).M(a)
∧ eqLi (hl+1, ds(l+1), ol+1)
Case 2.1: The execution of βl in h
l changes only local components of processor
i, i.e. the core, the local program thread memory and the allocated portion of the
stack. The ownership does not change.
ol+1 = ol
Obviously the execution of βl on both machines d
′ and hl will maintain the con-
sistency . Since all CPUs and all memory addresses (except IPCB and unallocated
stack addresses) are equal in d′ and hl and ownership safety only allows accesses
to those equal resources, the same will hold also for hl+1 and ds(l+1).
eqEi (h
l+1, ds(l+1), ol)
∧ eqShi (hl+1, ds(l+1), ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hl+1.M(a) = ds(l+1).M(a)
∧ eqLi (hl+1, ds(l+1), ol)
Case 2.2: The execution of βl might change also shared components. Ad-
ditionally the ownership might change. As in the previous case the changes in
the configurations will be equivalent on both machines and preserve the equality
(Lemma 4.7). Still since the ownership is a parameter of our simulation relation
we have to look deeper into the consistencies. We have to guarantee that if the
ownership changes the predicates still hold. Due to ownership safety, changes in
the ownership may appear only in o.Ash and o.Opi (Definition 4.5). But since the
union of these two sets of addresses stays unchanged (Definition 3.7), the changes
may only lead to redistribution of consistency conditions for particular addresses
between the predicates for local or shared equality, which are trivially proven by
the hypothesis.
Case 3: βl is a program thread local step, which does not border on an IP.
Since we assume βl to be a program thread step, we can deduce
¬(isr(hl.apici) ∨ isr(hl+1.apici)).
4.2. REORDERING PROOF 97
The simulation relation in the induction hypothesis is defined by:
B(hm, hl, ds(l), γl, ol, i, 0, chl, svl) =
eqEi (h
l, ds(l), ol)
∧ eqShi (hl, ds(l), ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hl.M(a) = ds(l).M(a)
∧ eqLi (hl,∆|γ
l|(hm, γl), ol)
∧ eqLi (hm, ds(l), ol)
The execution of βl in h
l changes only local components of processor i, i.e.
the core, the local program thread memory and the allocated portion of the stack.
On d we do not execute any step. βl is attached to the list of postponed steps. All
other simulation relation parameters stay the same as in the induction hypothesis.
s(l + 1) = s(l)
hl+1 = ∆(hl, (core, i, 0))
ds(l+1) = ds(l)
γl+1 = γlβl
ol+1 = ol
ipl+1 = 0
chl+1 = chl
svl+1 = svl
The intended simulation relation after βl is:
B(hm, hl+1, ds(l+1), γl+1, ol+1, i, 0, chl+1, svl+1) =
eqEi (h
l+1, ds(l), ol)
∧ eqShi (hl+1, ds(l), ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hl+1.M(a) = ds(l).M(a)
∧ eqLi (hl+1,∆(hm, γl+1), ol)
∧ eqLi (hm, ds(l), ol)
We can apply Lemma 4.6 which proves the environment and the shared equal-
ity between hl+1 and ds(l)(the first two conjuncts). The third one is trivially
proven by the hypothesis, since the ownership safety guarantees that addresses
owned by the interrupt handler are unchanged by the program thread. The lo-
cal equality eqLi (h
l+1,∆(hm, γl+1), ol) is trivially proven by the local equality
98 CHAPTER 4. INTERRUPT THREAD
eqLi (h
l,∆|γl|(hm, γl), ol) from the hypothesis and Lemma 4.5.
eqLi (h
l,∆|γ
l|(hm, γl), ol)
=⇒ eqLi (∆(hl, βl),∆(∆|γ
l|(hm, γl), βl), ol) (Lemma4.5)
=⇒ eqLi (hl+1,∆(hm, γl+1), ol)
The local equality eqLi (h
m, ds(l), ol) is contained in the hypothesis.
Case 4: βl is a JISR step, that does not start in an IP.
Since we assume βl to be a JISR step, we can deduce
¬isr(hl.apici) ∧ isr(hl+1.apici).
The simulation relation in the induction hypothesis is defined by:
B(hm, hl, ds(l), γl, ol, i, 0, chl, svl) =
eqEi (h
l, ds(l), ol)
∧ eqShi (hl, ds(l), ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hl.M(a) = ds(l).M(a)
∧ eqLi (hl,∆|γ
l|(hm, γl), ol)
∧ eqLi (hm, ds(l), ol)
From the shared equality in the simulation relation we know, that the APICs
on both machines are equal and the IPI signal is active. Thus we proceed by
executing a JISR on hl and ds(l). The execution of βl changes local components
and the local APIC of processor i. The list of the postponed steps does not
change. The ownership may change. The list of changed and stored registers are
empty, as this is the initial state of the ISR.
hl+1 = ∆(hl, (core, i, 1))
ds(l+1) = ∆(ds(l), (core, i, 1))
γl+1 = γl
ipl+1 = 0
chl+1 = ∅
svl+1 = ∅
4.2. REORDERING PROOF 99
The intended simulation relation after βl is:
B(hm, hl+1, ds(l+1), γl+1, ol+1, i, 0, chl+1, svl+1) =
eqEi (h
l+1, ds(l+1), ol+1)
∧ eqShi (hl+1, ds(l+1), ol+1)
∧ eqLHi (hl+1, ds(l+1), ol+1, ∅, ∅)
∧ hl+1.oCorei = ∆|γl|(hm, γl).corei
∧ ds(l+1).oCorei = hm.corei
∧ ∀a ∈ ol+1.Opi ∪ stack(ol+1.Osi , 0, hl+1.oCorei.gpr(rsp)).
hl+1.M(a) = ∆|γ
l|(hm, γl).M(a)
∧ ds(l+1).M(a) = hm.M(a)
The execution of the JISR step, as defined by Definition 4.1, Definition 2.20
and Definition 2.44, changes only the special purpose registers, the program
counter, the APIC and copies the core into oCore. The possible ownership trans-
fer is reduced by Software Condition 7 to acquiring addresses from the shared
memory o.Ash into the owns-set of the handler thread o.Ohi . Since
∀j 6= i. ol+1.Oj = ol.Oj
and
ol+1. Ash ⊆ ol.Ash
the ownership transfer does not generate some additional proof obligations for
shared and environment equality. Thus automatically the environment equality is
preserved. The change in the APICs are equivalent on both machines, therefore
the shared equality also is maintained.
Furthermore with respect to the ISR local equality
eqLHi (h
l+1, ds(l+1), ol+1, ∅, ∅)
we know from the MIPSP transition function and the definition of δ
jisr, that the
program counter has the same value on both machines, and that no GPR register
is changed by the JISR step.
hl+1.pcid
s(l+1).pci = 0
∧ hl+1.gpr = hl.gpr
∧ dl+1.gpr = dl.gpr
Thus instantiating the sets of changed registers ch with the empty set satisfies
the simulation conjuncts about the GPRs. The SPR registers are equal due to
the JISR semantics and the hypothesis.
100 CHAPTER 4. INTERRUPT THREAD
We also know from the hypothesis that in hl and ds(l) the shared memory is
equal. The same holds for the handler memory excluding the IPCB. Since none
of them is changed by JISR, this implies that the newly acquired addresses store
the same value on both machines, and hence the handler memory excluding the
IPCB is equal also in the new configurations
∀a ∈ ol+1.Ohi \ Aipcb i. hl+1.M(a) = ds(l+1).M(a) .
The set of registers stored in the IPCB sv is empty and the corresponding
simulation relation conjunct
ipcbeq(hl+1, ds(l+1), i, ∅)
is trivially proven. The condition about the handler stack region is fulfilled, since
no memory address is allocated for it
¬∃a ∈ ol.Osi . (hl+1.oCorei.gpr(rsp) ≥ a) ∧ (a > hl+1.corei.gpr(rsp)) .
Therefore the ISR local equality holds after βl.
The conditions about the oCore components are proven with the local equality
from the hypothesis
eqLi (h
l,∆|γ
l|(hm, γl), ol) ∧ eqLi (hm, ds(l), ol) (IH)
since oCore stores the core configurations from the pre-state.
hl+1.oCorei = h
l.corei (δ
jisr)
= ∆|γ
l|(hm, γl).corei (IH)
ds(l+1).oCorei = d
s(l).corei (δ
jisr)
= hm.corei (IH)
The equality of the program thread memory is proven by the local equality
claimed in the hypothesis.
Case 5: βl is JISR step, that does start in an IP.
This case is similar to the previous one. Since we start in an IP, i.e. at the
beginning of the ISR block, l = m. We execute the same step on both machines,
the lists of postponed steps before and after JISR are empty.
γl = γl+1 = ε
4.2. REORDERING PROOF 101
Difference to Case 4 appears in the local equality condition of the hypothesis:
B(hm, hm, ds(l), ε, ol, i, 1, chl, svl) =
eqEi (h
m, ds(l), ol)
∧ eqShi (hm, ds(l), ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hm.M(a) = ds(l).M(a)
∧ eqLi (hm, ds(l), ol)
The intended simulation relation after βl is:
B(hm, hl+1, ds(l+1), γl+1, ol+1, i, 0, chl+1, svl+1) =
eqEi (h
l+1, ds(l+1), ol+1)
∧ eqShi (hl+1, ds(l+1), ol+1)
∧ eqLHi (hl+1, ds(l+1), ol+1, ∅, ∅)
∧ hl+1.oCorei = hm.corei
∧ ds(l+1).oCorei = hm.corei
∧ ∀a ∈ ol+1.Opi ∪ stack(ol+1.Osi , 0, hl+1.oCorei.gpr(rsp)).
hl+1.M(a) = hm.M(a)
∧ ds(l+1).M(a) = hm.M(a)
To prove the simulation in this case we use identical arguments with the
previous case, but with an empty list of postponed instructions.
Case 6: βl is an ISR step (between JISR and eret).
Since we assume βl to be an ISR step, we can deduce
isr(hl.apici) ∧ isr(hl+1.apici).
The simulation relation in the induction hypothesis is defined by:
B(hm, hl, ds(l), γl, ol, i, 0, chl, svl) =
eqEi (h
l, ds(l), ol)
∧ eqShi (hl, ds(l), ol)
∧ eqLHi (hl, ds(l), ol, chl, svl)
∧ hl.oCorei = ∆|γl|(hm, γl).corei
∧ ds(l).oCorei = hm.corei
∧ ∀a ∈ ol.Opi ∪ stack(ol.Osi , 0, hl.oCorei.gpr(rsp)).
hl.M(a) = ∆|γ
l|(hm, γl).M(a)
∧ ds(l).M(a) = hm.M(a)
102 CHAPTER 4. INTERRUPT THREAD
The simulation relation in the induction claim has the same structure and
contains the same conjuncts.
ISR steps are executed simultaneously on both machines. According our
MIPSP model ISR steps are non interrupted core steps. They may depend on the
shared memory, the interrupt thread’s owned memory, on the stack and on the
GPR content. We consider three sub-cases - for save context steps, restore con-
text steps and the handler steps in-between. We have the following instantiation
of the simulation parameter in all three cases:
s(l + 1) = s(l) + 1
hl+1 = ∆(hl, (core, i, 0))
ds(l+1) = ∆(ds(l), (core, i, 0))
γl+1 = γl
ipl+1 = 0
The other parameters we define corresponding to the particular sub-case seman-
tics.
In the induction hypothesis of all three sub-cases we have shared and ISR
local equality for processor i , which implies equality of the shared memory, the
interrupt thread’s owned memory, and the stack (with displacement). According
to the GPR content we rely on Software Condition 5. Thus the handler stores the
GPR initial values (i.e. from the state before JISR) in the IPCB first and overwrites
the registers before reading them outside its save context part. Therefore relevant
registers are equal in both configurations during ISR.
Case 6.1: βl is a save context ISR step.
In this case we are considering a store instruction that saves an unchanged, hence
not listed in svl and chl, register in the IPCB. In the following we denote by r the
index of the register which is saved by βl, where iai(r) is the corresponding address
in the IPCB. We instantiate the remaining simulation parameters considering the
semantics of the store instruction execution. βl is not an I/O step, hence the
ownership does not change. The same is true for the list of changed registers.
svl+1 contains svl and r.
ol+1 = ol
chl+1 = chl
svl+1 = svl ∪ r
The only components changed by βl on both machines are the program counter
4.2. REORDERING PROOF 103
and the IPCB.
hl+1.corei.pc = h
l.corei.pc+ 4
hl+1.M4(iai(r)) = h
l.corei.gpr(r)
dl+1.pci = d
l.pci + 4
dl+1.M4(iai(r)) = d
l.corei.gpr(r)
All simulation relation parts, which do not include these two components, are
preserved. The only conjunct that we have to prove is the ISR local equality
eqLHi (h
l+1, ds(l+1), ol+1, chl+1, svl+1) .
The program counter, all registers, and the content of all memory addresses
owned the handler except the IPCB ones are trivially equal after βl on both ma-
chines. The instantiation of sv and ch preserves the condition on them embedded
in the GPR equality, i.e. chl+1 ⊆ svl+1. The only noticeable argument is required
by the IPCB condition.
ipcbeq(hl+1, ds(l+1), i, svl+1)
def
=
∀a ∈ Aipcb i. regi(a) ∈ svl+1 =⇒
hl+1.M4(a[31 : 2]00) = h
l+1.oCorei.gpr(regi(a))
∧ dl+1.M4(a[31 : 2]00) = dl+1.oCorei.gpr(regi(a))
As we have said, βl stores the content of register r in the memory word starting
at iai(r). Since the newly saved register has not yet been overwritten by the ISR
(Software Condition 5), its value in the IPCB is equal to the corresponding value
in the oCore by the induction hypothesis. And we know that oCore registers are
written only by a JISR step and stay unchanged during ISR execution.
hl+1.M4(iai(r)) = h
l.corei.gpr(r) (βl)
= hl.oCorei.gpr(r) (IH)
= hl+1.oCorei.gpr(r) (βl)
The same holds also for d.
The only IPCB addresses changed by βl are the iai(r) and the three consec-
utive ones. For every address a from these four addresses regi returns the same
value, i.e. the index r, and obviously the aligned version of every of the four
addresses is iai(r)
∀a ∈ {iai(r), iai(r)[31 : 2]01, iai(r)[31 : 2]02, iai(r)[31 : 2]03}.
regi(a) = r
∧ a[31 : 2]00 = iai(r)
104 CHAPTER 4. INTERRUPT THREAD
All other IPCB addresses corresponding to registers in svl+1 store the proper
oCore registers due to the induction hypothesis. Thus, the IPCB condition is
also fulfilled.
Case 6.2: βl is a restore-context ISR step.
In this case we are considering a load instruction that restores a register value
from the IPCB. In the following we denote by r the index of the register which
is restored by βl, where iai(r) is the corresponding address in the IPCB. We
instantiate the remaining simulation parameters considering the semantics of the
load instruction execution. Like in the previous case βl is not an I/O step, and
the ownership does not change ol+1 = ol. The list of stored registers does not
change svl+1 = svl. We have to exclude the index of the restored register from
chl to obtain chl+1.
ol+1 = ol
chl+1 = chl \ r
svl+1 = svl
Similarly to the previous case most of the simulation relation is trivially proven
by the hypothesis, since the configuration changes are very limited. The only
components changed by βl are the program counter and the GPR.
hl+1.corei.pc = h
l.corei.pc+ 4
hl+1.corei.gpr(r) = h
l.M4(iai(r))
dl+1.pci = d
l.pci + 4
dl+1.corei.gpr(r) = d
l.M4(iai(r))
The program counter equality is trivial. We only have to take care of the register,
that βl restores.
We restore the register value from the IPCB. By the induction hypothesis IPCB
values correspond to the content of oCore registers. We deduce that after βl the
considered register stores value equal to the one in oCore on both machines. We
show here only the argumentation for h.
hl+1.corei.gpr(r) = h
l.M4(iai(r)) (βl)
= hl.oCorei.gpr(r) (IH)
= hl+1.oCorei.gpr(r) (βl)
Thus the simulation relation holds.
Case 6.3: βl is an ISR step, which does not belong to the save context or restore
context part.
In this sub-case all components, which are relevant to the step’s execution (input
4.2. REORDERING PROOF 105
data), are equal in both machines. If the instruction reads memory, we rely on
ownership safety and the hypothesis. Operands coming from general purpose
registers are equal due Software Condition 5 (i.e. any source register rs has been
previously overwritten by the ISR and is included in chl) and the hypothesis.
Based on the same arguments as in Lemma 4.7 and Lemma 4.5, we claim that
the changes to the configuration are equal on both machines, taking into account
the relaxed local equality for handler thread machines. They basically convey that
since the input data for the instruction processing is equal, then also the output
data and hence the changes in the GPR, the stack, the handler owned memory or
the shared memory are equivalent. IPCB, oCore, SPR and the owned memory of
the program thread are not changed by the execution of βl, due to the MIPSP
architecture, ownership safety and our assumption that βl is not a save context
ISR step. With this arguments most of the simulation relation is proven trivially.
In the following lines we look into the eqLH conditions, which are not obviously
holding.
We instantiate the remaining simulation parameters step by step, since we
have to consider some case distinctions.
The list of stored registers does not change.
svl+1 = svl
If the executed instruction does not change the GPR or writes to a destination
register rd, which is contained in chl, then
chl+1 = chl .
Otherwise we add the index of the newly changed register rd to the list of changed
registers.
chl+1 = chl ∪ rd
This leeds to a new proof obligation
rd ∈ chl+1 =⇒ ds(l+1).corei.gpr(rd) = hl+1.corei.gpr(rd)
coming from the gpreq predicate in the handler local equality. Since rd stores the
same new value on hl+1 and ds(l+1) the statement is trivially proven.
If βl is a local step, the ownership does not change
ol+1 = ol
and the intended simulation relation after βl is trivially proven.
Last we only have to examine the possible ownership transfer in case of an
I/O step. If
ol+1 6= ol
the ownership transfer is possible between the shared memory o.Ash and the owns-
set of the handler thread o.Ohi , excluding the IPCB addresses. The union of both
sets however stays the same.
ol+1.Ash ∪ ol+1.Ohi = ol.Ash ∪ ol.Ohi
106 CHAPTER 4. INTERRUPT THREAD
According to the induction hypothesis and the simulation relation in the state be-
fore βl, both sets o
l.Ash and ol.Ohi collect addresses of memory, which stores the
same values on hl and ds(l) (i.e. due to eqSh and eqLH). Thus, the redistribution
of these addresses alone does not generate any additional proof obligations, con-
sidering the equality of shared and handler owned memory on hl+1 and ds(l+1).
And as we said above, if βl changes the memory, then the changes are equivalent
on both machines. Therefore the intended simulation theorem is proven.
Case 7: βl is an eret step. The same step is executed simultaneously on both
machines. The execution of βl changes only the program counter, the SPR and
the APIC.
The simulation relation in the induction hypothesis is defined by:
B(hm, hl, ds(l), γl, ol, i, 0, chl, svl) =
eqEi (h
l, ds(l), ol)
∧ eqShi (hl, ds(l), ol)
∧ eqLHi (hl, ds(l), ol, ∅, svl)
∧ hl.oCorei = ∆|γl|(hm, γl).corei
∧ ds(l).oCorei = hm.corei
∧ ∀a ∈ ol.Opi ∪ stack(ol.Osi , 0, hl.oCorei.gpr(rsp)).
hl.M(a) = ∆|γ
l|(hm, γl).M(a)
∧ ds(l).M(a) = hm.M(a)
We instantiate the simulation parameters as follows:
s(l + 1) = s(l) + 1
hl+1 = ∆(hl, (core, i, 0))
ds(l+1) = ∆(ds(l), (core, i, 0))
γl+1 = γl
ipl+1 = 0
chl+1 = chl
svl+1 = svl
Since we are executing an I/O step the ownership may change. If ol+1 6= ol the
ownership transfer is possible between the shared memory o.Ash and the owns-set
of the handler thread o.Ohi , excluding the IPCB addresses (Software Conditions 4
4.2. REORDERING PROOF 107
and 7). Formally this is expressed by the following expression:
(∀j 6= i. ol+1.Oj = ol.Oj)
∧ (ol+1.Opi = ol.Opi )
∧ (ol+1.Osi = ol.Osi )
∧ (ol+1.Ohi ∪ ol+1.Ash = ol.Ohi ∪ ol.Ash)
Since βl does not change any memory and both memory regions are equal
in the state before βl according to the simulation relation, the redistribution of
addresses does not generate additional proof obligations.
The intended simulation relation after βl is:
B(hm, hl+1, ds(l+1), γl+1, ol+1, i, ipl+1, chl+1, svl+1) =
eqEi (h
l+1, ds(l+1), ol+1)
∧ eqShi (hl+1, ds(l+1), ol+1)
∧ ∀a ∈ ol+1.Ohi \ Aipcb i. hl+1.M(a) = ds(l+1).M(a)
∧ eqLi (hl+1,∆|γ
l|(hm, γl), ol+1)
∧ eqLi (hm, ds(l+1), ol+1)
Since the changes on the APIC (the only non local component changed) are
equivalent, the shared and environment equality are maintained. The equality of
handler memory is part of the local handler equality eqLH in the hypothesis. Next
we examine the local equality between hl+1 and the initial configuration hm with
respect to the postponed steps γl.
eqLi (h
l+1,∆|γ
l|(hm, γl), ol+1)
All memory addresses of the program thread memory and of the stack
∀a ∈ ol+1.Opi ∪ stack(ol+1.Osi , 0, hl+1.corei.rsp)
store the same values in the hl and hl+1 due to the semantics of the eret step
and the induction hypothesis.
hl+1.M(a) = hl.M(a) (δeret)
= ∆|γ
l|(hm, γl).M(a) (IH)
The IH says also that the GPR in the oCore is equal to the GPR after executing
γl on hm. From Software Condition 6 we know that after βl the GPR content
must be equal to the one in the oCore. As part of the ISR local equality in the
108 CHAPTER 4. INTERRUPT THREAD
induction hypothesis the GPR of hl is equal to the the GPR in the oCore for all
registers not listed in chl. Thus we rely on the restore context part of the handler
to restore all changed registers and conclude an empty set of changed registers
chl+1 = ∅ .
When we consider in addition the eret semantics we get:
hl+1.corei.gpr = h
l.corei.gpr (δ
eret)
= hl.oCorei.gpr (IH; eq
L
i ; ch
l+1 = ∅)
= ∆|γ
l|(hm, γl).corei.gpr (IH)
Thus we only have to prove the SPR equality and the equality of the pro-
gram counter. We argue based on the MIPSP eret definition, the SPR equality
condition contained in the ISR local equality and the oCore condition from the
induction hypothesis.
hl+1.corei.pc = h
l.corei.spr(epc) (δ
eret)
= hl.oCorei.pc (IH; eq
L
i )
= ∆|γ
l|(hm, γl).corei.pc (IH)
hl+1.corei.spr(sr) = h
l.corei.spr(esr) (δ
eret)
= hl.oCorei.spr(sr) (IH; eq
L
i )
= ∆|γ
l|(hm, γl).corei.spr(sr) (IH)
hl+1.corei.spr(mode) = h
l.corei.spr(emode)(δ
eret)
= hl.oCorei.spr(mode) (IH; eq
L
i )
= ∆|γ
l|(hm, γl).corei.spr(mode) (IH)
Thus eqLi (h
l+1,∆|γl|(hm, γl), ol+1) holds.
The proof for eqLi (h
m, ds(l+1), ol+1) is similar and uses the same arguments.
Case 8: βl is not a step of processor i but some interleaved step executed on
another processor or an IPI step.
We remind the reader, that interleaving in an ISR block may happen only in the
ISR sequence or right after an eret step. We execute the same step on both
machines. The execution of βl does not change anything local to processor i.
It may change components local to other processors and shared configuration
4.2. REORDERING PROOF 109
IPP IP IP IP
L L L ... Li L L
0                            1             1               1             1             0              0                             1             1              1              1             0             0                      0ISR
IO
JISR
IO
JISR
L L IO
IP
ERET
IO
ERET
IO
IP
L
IP
L
IP
/i
...
IP IP IP IP IPPIP
Case 8.1 Case 8.2
Figure 4.12: The simulation relation for interleaved steps depends on the state of
the ISR bit. For interleaving with ISR steps the ISR bit is set (Case 8.1), whereas
for interleaved steps right after an eret step the ISR bit is cleared.
elements. Thus the simulation relation parameters stay unchanged except for the
configurations h and d, and the ownership.
s(l + 1) = s(l) + 1
hl+1 = ∆(hl, βl)
ds(l+1) = ∆(ds(l), βl)
γl+1 = γl
ipl+1 = 0
chl+1 = chl
svl+1 = svl
Since βl may be an I/O step, the ownership may change but the ownership
set of processor i will remain unchanged ol.Oi = o
l+1.Oi .
Due the structure of our simulation relation, we need a case distinction on the
location of βl in the ISR block (Figure 4.12).
Case 8.1: βl is an interleaved step bordered by ISR steps.
110 CHAPTER 4. INTERRUPT THREAD
The simulation relation in the induction hypothesis is defined by:
B(hm, hl, ds(l), γl, ol, i, 0, chl, svl) =
eqEi (h
l, ds(l), ol)
∧ eqShi (hl, ds(l), ol)
∧ eqLHi (hl, ds(l), ol, chl, svl)
∧ hl.oCorei = ∆|γl|(hm, γl).corei
∧ ds(l).oCorei = hm.corei
∧ ∀a ∈ ol.Opi ∪ stack(ol.Osi , 0, hl.oCorei.gpr(rsp)).
hl.M(a) = ∆|γ
l|(hm, γl).M(a)
∧ ds(l).M(a) = hm.M(a)
The intended simulation relation after βl is:
B(hm, hl+1, ds(l+1), γl+1, ol+1, i, 0, chl+1, svl+1) =
eqEi (h
l+1, ds(l+1), ol+1)
∧ eqShi (hl+1, ds(l+1), ol+1)
∧ eqLHi (hl+1, ds(l+1), ol+1, chl, svl)
∧ hl+1.oCorei = ∆|γl|(hm, γl).corei
∧ ds(l+1).oCorei = hm.corei
∧ ∀a ∈ ol.Opi ∪ stack(ol.Osi , 0, hl+1.oCorei.gpr(rsp)).
hl+1.M(a) = ∆|γ
l|(hm, γl).M(a)
∧ ds(l+1).M(a) = hm.M(a)
If we apply Lemma 4.5 and Lemma 4.7 we know, that the changes on both
machines will be equivalent and the simulation relation preserved.
Case 8.2: βl is preceded by an eret step and followed by a local program thread
step.
The simulation relation in the induction hypothesis is defined by:
B(hm, hl, ds(l), γl, ol, i, 0, chl, svl) =
eqEi (h
l, ds(l), ol)
∧ eqShi (hl, ds(l), ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hl.M(a) = ds(l).M(a)
∧ eqLi (hl,∆|γ
l|(hm, γl), ol)
∧ eqLi (hm, ds(l), ol)
4.2. REORDERING PROOF 111
d0
L
γ3
L
γ4d
L
γ1
L
γ2
I/O
h0
L
γ1
L
γ2h
L
γ3
L
γ4
I/O
B
Case 3
B
Case 2
B
Case 2
B
Case 3
B
Case 2
B
Case 2
Program Thread 
Program Thread 
Figure 4.13: Simulation relation for interleaving blocks consisting only of program
thread steps.
The intended simulation relation after βl is:
B(hm, hl+1, ds(l+1), γl+1, ol+1, i, 0, chl+1, svl+1) =
eqEi (h
l+1, ds(l), ol)
∧ eqShi (hl+1, ds(l), ol)
∧ ∀a ∈ ol.Ohi \ Aipcb i. hl+1.M(a) = ds(l).M(a)
∧ eqLi (hl+1,∆(hm, γl+1), ol)
∧ eqLi (hm, ds(l), ol)
We again apply Lemma 4.5 and Lemma 4.7 to conclude, that the changes on
both machines will be equivalent and the simulation relation preserved.
4
After proving the simulation for ISR blocks, we extend it to complete execu-
tions of a single core including non-ISR interleaving blocks. We remind the reader,
that in the execution sequence a of given processor next to ISR blocks, we can
have interleaving blocks consisting of hypervisor steps without ISR steps (Figure
4.13), VMEXIT steps, and guest steps.
Theorem 4.11 (Execution Simulation Theorem Single Processor) For every
MIPSP IP schedule execution defined by initial configuration h
0 and input se-
quence β , if every extended schedule execution sequence starting from h0 con-
tains only valid ISRs and is safe with respect to some ownership sequence, and d0
is a configuration equivalent to the initial configuration h0, there exist an input
112 CHAPTER 4. INTERRUPT THREAD
sequence ω and a step function s ∈ N→ N, such that if we execute ω starting in
d0, then:
• the execution sequence defined by d0 and ω is an extended IP schedule for
a given processor i and
• the simulation relation B is maintained for every step l between configura-
tions hl and ds(l).
∀h0, β, d0, i, o.
IPsched(h0, β)
∧ (∀ρ. ∃o′. schedXT (h0, ρ) =⇒ (ISRV (h0, ρ) ∧ safeseqXT (h0, ρ, o, o′)))
∧B(h0, h0, d0, ε, o, i, 1, ∅, ∅)
=⇒ ∃ω, s. schedXT i(d0, ω)
∧ ∀l ∈ [0 : |β| − 1]. ∃γ, o′, ch, sv,m ∈ [0 : l]. B(hm, hl, ds(l), γ, o′, i, ipl, ch, sv)
where
ipl = IPp(h0, β, i, l)
Proof We instantiate ω with the reordered version of β, in which all ISR blocks
of processor i are reordered by the reoISRb function and the order of all steps out-
side of ISR interleaving blocks stays unchanged. We Lemma 4.9 on the reordered
blocks to conclude that this instantiation results in an extended IP schedule exe-
cution for processor i.
As a second step we prove by induction over l the simulation relation for every
state hl and the corresponding state ds(l).
Induction base l = 0
The simulation relation for the base case
B(h0, h0, ds(l), γ0, o0, i, ip0, ch0sv0)
is trivially proven by the hypothesis, when we instantiate the parameters as follows.
s(l) = 0
γ0 = ε
o0 = o
ip0 = 1
ch0 = ∅
sv0 = ∅
4.2. REORDERING PROOF 113
Induction step l→ l + 1
Similarly to the proof of Theorem 4.10 we have a case split on βl. The proof is
quite similar to the previous one. We can distinguish in general between steps
inside and outside of ISR blocks. Steps within ISR blocks of processor i are
described in the previous proof. Steps outside of the ISR blocks can be program
thread steps of processor i, guest steps or VMEXIT steps of processor i, IPI steps,
and steps of other processors.
The major difference in the proof is that now the reference initial configuration
of the simulation relation hm have to be updated every time, when we pass an
interleaving point of the program thread, so that hm is always equal to the last
configuration, which is a program thread interleaving point. Thus, for βl in an
ISR block hm is always equal to the initial configuration of the ISR block like in
the pervious theorem.
• For steps within ISR blocks of processor i the proof is identical with the
previous theorem.
• If βl is a hypervisor step (i.e. core step in system mode) of processor i
outside of the ISR blocks we have following cases:
– βl is the first step in an interleaving block. The proof is identical to
Case 1 of the previous theorem.
– βl is the last step in an interleaving block. The proof is identical to
Case 2 of the previous theorem.
– βl is a step, which does not border on an IP. The proof is identical to
Case 3 of the previous theorem.
• If βl is a guest step or a VMEXIT step it is executed simultaneously on both
machines and its execution does not change any configuration components
related to the simulation relation. The simulation relation parameters also
do not change. Thus the simulation relation follows from the induction
hypothesis.
• If βl is an IPI step or a step of another processor, the proof is identical to
Case 8.2 of the previous theorem.
4
Finally we are ready to state and proof a theorem about reordering of complete
execution sequences.
Theorem 4.12 (Execution Simulation Theorem) For every MIPSP IP sched-
ule execution defined by initial configuration h0 and input sequence β , if every
extended schedule execution sequence starting from h0 contains only valid ISRs
and is safe with respect to some ownership sequence, and d0 is a configuration
114 CHAPTER 4. INTERRUPT THREAD
equivalent to the initial configuration h0, there exist an input sequence ω and a
step function s ∈ N→ N, such that if we execute ω starting in d0, then:
• the execution sequence defined by d0 and ω is an extended IP schedule and
• the simulation relation B is maintained for every step l between configura-
tions hl and ds(l).
∀h0, β, d0, o.
IPsched(h0, β)
∧ (∀ρ. ∃o′. schedXT (h0, ρ) =⇒ (ISRV (h0, ρ) ∧ safeseqXT (h0, ρ, o, o′)))
∧B(h0, h0, d0, ε, o, i, 1, ∅, ∅)
=⇒ ∃ω, s. schedXT (d0, ω)
∧ ∀l ∈ [0 : |β| − 1], i ∈ Pid . ∃γ, o′, ch, sv,m ∈ [0 : l].
B(hm, hl, ds(l), γ, o′, i, ipl, ch, sv)
where
ipl = IPp(h0, β, i, l)
Proof We prove the theorem by applying inductively Theorem 4.11 for all pro-
cessors. In doing so we have to guarantee that applying Theorem 4.11 for a given
processor i will preserve the properties concluded by preceding applications of the
theorem for other processors. Due to Lemma 4.9 we can conclude that the re-
ordering of the steps of one processor will neither change the order of steps nor
split interleaved blocks of all other processors. In particular already reordered ex-
tended schedules for one processors will not be destroyed by subsequent reordering
and application of Theorem 4.11.
Considering the simulation relation, we know that subsequent reordering of ISR
blocks of a given processor i preserve shared equality and environment (according
to i) equality. This is sufficient to conclude, that the reordering of local steps of
processor i preserves the simulation relation of other processors.
4
Chapter 5
C-IL Semantics
In this chapter we introduce semantics for the C Intermediate Language (C-IL),
which is a simplified programming language similar to C. C-IL was developed and
defined by Sabine Schmaltz and Andrey Shadrin [SS12] as part of the Verisoft
XT verification technology. In this work we omit some definitions, which can be
looked up in [SS12], e.g. technical details of the expression evaluation. We are
aiming at stating a simulation relation between a C-IL computation and a MIPSP
execution. Thus our focus concerning C-IL is on the definition of the step function
and compiler correctness theorem. Our compiler correctness theorem is based on
the work of Andrey Shadrin [Sha12] and Mikhail Kovalev [Kov13], which we adapt
to MIPSP . Furthermore, we define ownership safety for C-IL computations. We
assume C-IL safety in order to deduce safety of the MIPSP execution.
In Section 5.1 we present the sequential C-IL semantics, which we later in
Section 5.2 enhance to the concurrent case. In Section 5.3.4 we define ownership
safety. Finaly in Section 5.3 we define the compiler correctness theorem.
In the this chapter we do not consider interrupts. They are added to the
semantics in the next chapter.
5.1 Sequential C-IL Semantics
5.1.1 C-IL Types
As the most program languages C-IL supports primitive, composed and pointer
types.
5.1.1.1 Primitive types
J Definition 5.1
Primitive types
The set of primitive types is defined by the void type void and six types for signed
(intn) and unsigned (uintn) integer values, where n defines the size of the type
in bits, i.e. 8, 16 and 32 bits. The Boolean type we implement using integers.
TP
def
= {void} ∪ {intn | n ∈ {8, 16, 32}} ∪ {uintn | n ∈ {8, 16, 32}}
115
116 CHAPTER 5. C-IL SEMANTICS
The range for each integer type is defined as follows:
• intn - n-bit signed integers
intn
def
= {−2n−1, . . . , 2n−1 − 1}
• uintn - n-bit unsigned integers
uintn
def
= {0, . . . , 2n − 1}
5.1.1.2 Complex types
The array type array(t, n) is defined by the type of the array elements t and their
number n.
We define individually for each CIL-program a set of struct type names TC .
A struct type has a unique name in TC and consists of several fields.
We define two types of pointers types in C-IL. By ptr(t) we denote the set of
typed pointers to values of type t. A function pointer of type fptr(t, T ) points
to a function with return value of type t, where T is a list of the types of the
functions’ parameters.
Definition 5.2 I
C-IL types
The set of C-IL types T is defined inductively and consists of primitive types,
pointer types, array types and struct types.
• primitive types: t ∈ TP =⇒ t ∈ T,
• struct types: tc ∈ TC =⇒ struct tc ∈ T,
• array types: t ∈ T, n ∈ N =⇒ array(t, n) ∈ T,
• (regular) pointer types: t ∈ T =⇒ ptr(t) ∈ T,
• function pointer types: t ∈ T, T ∈ T∗ =⇒ fptr(t, T ) ∈ T.
5.1.1.3 Type qualifiers
C-IL provides two type of qualifiers: volatile and const. Type qualifiers are used
to provide additional information to the compiler in order to allow or forbid some
optimizations of the code. They partially define the behavior of instances of the
different types and especially how they can be accessed.
Definition 5.3 I
Set of type
Qualifiers
We define the set of type qualifiers Q as:
Q def= {volatile, const}.
• The qualifier volatile defines shared data. It is used as a compiler directive
that prevents the compiler from reordering volatile memory accesses and
some other optimizations, i.e. volatile data is never cached in registers.
This is highly useful in memory-mapped-I/O and in concurrent applications.
5.1. SEQUENTIAL C-IL SEMANTICS 117
• The qualifier const defines constant values. A memory access performed
on a variable of a type qualified with the const qualifier can not be a write.
This may enable the compiler to perform additional optimizations on the
code, relying on the fact that a given value in memory is never overwritten.
J Definition 5.4
Qualified C-IL
types
The set of qualified types TQ we define inductively as a composition of qualifiers
and unqualified types from T. TQ concists of:
• qualified primitive types: q ⊆ Q ∧ t ∈ TP =⇒ (q, t) ∈ TQ,
• qualified struct types: q ⊆ Q ∧ tc ∈ TC =⇒ (q, struct tc) ∈ TQ,
• qualified array types: q ⊆ Q ∧ t ∈ TQ, n ∈ N =⇒ (q, array(t, n)) ∈ TQ,
• qualified (regular) pointer types: q ⊆ Q ∧ t ∈ TQ =⇒ (q,ptr(t)) ∈ TQ,
• qualified function pointer types: q ⊆ Q∧t ∈ TQ, T ∈ T∗Q =⇒ (q, fptr(t, T )) ∈
TQ.
We allow that the set of qualifiers q ⊆ Q is empty. This makes it easy to
convert unqualified types to qualified types, i.e. just adding an empty qualifiers
set.
J Definition 5.5
Converting
Qualified Types to
Unqualified
To turn a qualified type into an unqualified one, we throw away all the qualifiers.
qt2t(x)
def
=
t x = (q, t) ∧ t ∈ TP
ptr(qt2t(x′)) x = (q,ptr(x′))
array(qt2t(x′), n) x = (q, array(x′, n))
funptr(qt2t(x′), [qt2t(X1), . . . , qt2t(Xn)]) x = (q, funptr(x′, [X1, . . . , Xn]))
struct tc x = (q, struct tc)
In some cases, e.g. case distinctions in definitions, we want to distinguish
among pointer and array types. Therefor we define the following type predicates.
J Definition 5.6
Type Predicates
The predicates isptr , isarray and isfptr denote that given type is a pointer
type, an array type, or a function pointer type respectively.
isptr , isarray , isfptr :: T→ B
isptr(t)
def
= ∃t′. t = ptr(t′)
isarray(t)
def
= ∃t′, n′. t = array(t′, n′)
isfptr(t)
def
= ∃t′, T. t = fptr(t′, T )
118 CHAPTER 5. C-IL SEMANTICS
5.1.2 Values
Values in C-IL are bit-strings paired with a type specifier. In the following lines
we define each of the value sets to the corresponding types, where V denotes the
set of variables, and Fname denotes the set of function names.
With sizeptr ∈ N we refer to the archticture dependent size of pointers in
bytes. For MIPSP we set sizeptr = 4.
• Primitive values:
n ∈ {8, 16, 32} ∧ i ∈ Bn ⇒ val(i, intn) ∈ val intn
n ∈ {8, 16, 32} ∧ i ∈ Bn ⇒ val(i,uintn) ∈ valuintn
• Structs:
tC ∈ TC ∧ B ∈ (B8)∗ ⇒ val(B, struct tC) ∈ valstruct
• Pointer and array values:
a ∈ B8·sizeptr∧t ∈ T∧(isptr(t)∨isarray(t))⇒ val(a, t) ∈ valptr(t) ⊆ valptr
• Local references values:
v ∈ V∧o ∈ N∧i ∈ N∧t ∈ T∧(isptr(t)∨isarray(t))⇒ lref((v, o), i, t) ∈ vallref
• Function pointers values:
a ∈ B8·sizeptr ∧ t ∈ T ∧ isfptr(t)⇒ val(a, t) ∈ valfptr
• Symbolic function values:
f ∈ Fname ∧ isfptr(t)⇒ fun(f, t) ∈ valfun
Definition 5.7 I
C-IL values
We define the set of all possible values val as an union of the possible values
of all types.
val
def
= valintn ∪ valuintn ∪ valstruct ∪ valptr ∪ vallref ∪ valfptr ∪ valfun
5.1. SEQUENTIAL C-IL SEMANTICS 119
5.1.3 Expressions
J Definition 5.8
Operators
C-IL provides the set of unary operators O1 and the set of binary operators O2
defined in the following way.
O1 ⊂ {⊕ | ⊕ ∈ val ⇀ val}
O1
def
= {−,∼, !}
O2 ⊂ {⊕ | ⊕ ∈ val × val ⇀ val}
O2
def
= {+, -, ∗, /,%, <<,>>,<,>,<=, >=,==, ! =,&, |, ,ˆ&&, ‖}
J Definition 5.9
Expressions
We define the set of C-IL expressions E inductively. It contains:
• Constants: c ∈ val⇒ c ∈ E,
• Variable names: v ∈ V⇒ v ∈ E,
• Function names: f ∈ Fname ⇒ f ∈ E,
• Unary operations on expressions: e ∈ E ∧ ⊕ ∈ O1 ⇒ ⊕e ∈ E,
• Binary operations on expressions: e0, e1 ∈ E ∧ ⊕ ∈ O2 ⇒ (e0 ⊕ e1) ∈ E,
• Conditional: e, e0, e1 ∈ E⇒ (e ? e0 : e1) ∈ E,
• Type cast: t ∈ TQ ∧ e ∈ E⇒ (t)e ∈ E ,
• Dereferencing pointer: e ∈ E⇒ ∗e ∈ E,
• Address of expression: e ∈ E⇒ &e ∈ E ,
• Field access: e ∈ E ∧ f ∈ F⇒ (e).f ∈ E, where F denotes the set of field
names,
• Size of Type: t ∈ TQ ⇒ sizeof(t),
• Size of Expression: e ∈ E⇒ sizeof(e).
We define the following two shorthands in correspondence to the established
C syntax.
• Access of array elements of a given array a.
a[i]
def
= ∗(a+ i)
• Access to fields in structs from a pointer a.
a→ f def= (∗(a)).f
120 CHAPTER 5. C-IL SEMANTICS
5.1.4 Statements
Definition 5.10 I
C-IL Statements
We define the set of C-IL statements S inductively as follows:
• Assignment: e0, e1 ∈ E⇒ e0 = e1 ∈ S
• Goto: l ∈ N⇒ goto l ∈ S
• If-Not-Goto: e ∈ E ∧ l ∈ N⇒ ifnot e goto l ∈ S
• Function Call: e0, e ∈ E ∧ E ∈ E∗ ⇒ e0 = call e(E) ∈ S, where the
expression list E denotes the sequence of parameters passed to the function.
• Procedure Call: e ∈ E ∧ E ∈ E∗ ⇒ call e(E) ∈ S
• Return: e ∈ E⇒ return e ∈ S and return ∈ S
5.1.5 Program
Definition 5.11 I
C-IL Program
A C-IL program pi consists of a set of global variables pi.VG , a type table pi.TF
for struct types, and a function table pi.F .
progC-IL
def
= [VG ∈ (V× TQ)∗, TF ∈ TC ⇀ (F× TQ)∗,F ∈ Fname ⇀ funC-IL]
• pi.VG – Denotes the sequence of global variable names and their types.
• pi.TF – The type table maps names of struct types to a list of field names
and the corresponding field types.
• pi.F – The function table maps function names to function table entries
(Definition 5.12).
Definition 5.12 I
Function Table
Entry
Function table entries are defined by the type funC-IL.
funC−IL
def
= [rettype ∈ TQ, npar ∈ N,P ∈ S∗ ∪ {extern},V ∈ (V× TQ)∗]
Function table entries consist of:
• rettype – type of the function’s return value,
• npar – number of the function’s parameters,
• P – function body, i.e. list of statements, or extern,
• V – local variables and parameters of the function paired with their types.
In C-IL programs assembly functions and compiler intrinsics are declared as exter-
nal functions. This is expressed by the special value extern of the function body
in function table entries to assembly functions. The first npar entries in V denote
function parameters.
5.1. SEQUENTIAL C-IL SEMANTICS 121
5.1.6 Configuration
As next step we define the state of C-IL programs.
J Definition 5.13
C-IL-
Configuration
A C-IL configuration consists of a byte-addressable global memory c.M and
a stack c.s.
CC -IL
def
= [M∈ Bgm → B8, s ∈ frame∗C-IL]
M denotes the global memory of the program, where
Bgm ⊂ B8·sizeptr
denotes the set of global memory addresses. It stores the global variables.A C-IL
stack is a list of C-IL stack frames (Definition 5.14). Local variables are stored in
the stack.
J Definition 5.14
C-IL Stack Frame
C-IL stack frames are defined by the type frameC-IL. It consists of a local
memory, a return destination, a function name and a location.
frameC-IL
def
= [ME ∈ V⇀ (B8)∗, rds ∈ valptr∪vallref∪{⊥}, f ∈ Fname, loc ∈ N]
• ME – The local memory for local variables and parameters maps variable
names to a byte-string representation of the value of the corresponding
variable.
• rds – The return value destination describes where the return value of a
function call has to be stored. ⊥ denotes the absence of a return value and
a return value destination.
• f – The name of the function to which the frame belongs.
• loc – The location counter defines the next statement to be executed.
5.1.7 Context
For the execution of C-IL program is required information, which is specific to
the compiler and the underlying machine. This additional information includes
specification of the global data of the program, i.e. addresses, offset, size. We
122 CHAPTER 5. C-IL SEMANTICS
call this environmental information context of the C-IL program and parametrize
C-IL executions with it.
Definition 5.15 I
C-IL Context
The context of a C-IL program is defined by the type contextC-IL.
contextC-IL
def
= [sizeptr ∈ N,
allocgvar ∈ V⇀ B8·sizeptr ,
Faddr ∈ Fname ⇀ B8·sizeptr ,
sizestruct ∈ TC ⇀ N,
size t ∈ TP ,
offset ∈ TC × F⇀ N,
cast ∈ val × TQ ⇀ val,
endianness ∈ {little,big},
intrinsics ∈ Fname ⇀ funC−IL,
Rextern ∈ Fname ⇀ 2val×CC -IL×CC -IL ]
• sizeptr denotes the size of pointer types in bytes.
• allocgvar maps global variable names to addresses.
• Faddr maps function names to addresses.
• sizestruct returns the size of struct names.
• size t denotes the type of the value returned by the sizeof operator.
• offset returns the byte-offset of a given field in a struct.
• cast does type casting of a given value to a given type.
• endianess denotes the order in which bytes are stored in the memory.
• intrinsics returns a function table entry for compiler intrinsic functions.
• Rextern defines the transition relation for external procedures, i.e. functions
that are not implemented on the C-IL level.
Intrinsics are pre-defined functions provided by the compiler. A function call
to an intrinsic does not create a new stack frame but only inlines the intrinsic
implementation into the program code [Sch13]. On the C-IL level the effect of
compiler intrinsics is defined together with the effect of external function calls by
Rextern . We consider two compiler intrinsics: for read-modify-write operations
and VMRUN. The only external procedure, that we define, implements the send-
ipi operation. We define Rextern later in this chapter, when we are defining the
C-IL transition function.
5.1. SEQUENTIAL C-IL SEMANTICS 123
The VMRUN intrinsic does not have a return value or parameters. It is
replaced by the compiler by the VMRUN MIPSP instruction.
J Definition 5.16
VMRUN FTE
The function table entry for the VMRUN intrinsic is defined as follows.
θ.intrinsics(vmrun)
def
= [(∅, void), 0, extern, ε]
The read-modify-write intrinsic does not have a return value. It gets four
parameters: an address where the value from the memory location defined by rds
to be stored, an address of volatile memory location dest, a compare value cmp
and a value to be written in case of a successful comparison exchng.
J Definition 5.17
Read-Modify-
Write
FTE
The function table entry for the read-modify-write intrinsic is defined as follows.
θ.intrinsics(rmw)
def
= [(∅, void), 4, extern, (rds, (∅,ptr({volatile}, int32)))
◦ (dest, (∅,ptr(∅, int32)))
◦ (cmp, (∅, int32))
◦ (exchng, (∅, int32))]
J Definition 5.18
Read-Modify-
Write
Predicate
The predicate rmwpi,θc denotes whether a given expression e is a reference to
the read-modify-write intrinsic.
rmwpi,θc (e ∈ E) ∈ B def= [e]pi,θc = fun(rmw, t)
∨ [e]pi,θc = val(θ.Faddr(rmw), t)
where
t = funptr(void,ptr(int32) ◦ ptr(int32) ◦ int32 ◦ int32)
Having the information from the context it is straightforward to calculate the
size of a given C-IL type.
J Definition 5.19
Size of Types
The function sizeθ returns the number of bytes required to store a value of
the given type.
sizeθ(t ∈ T) ∈ N =

n/8 t = intn ∨ t = uintn
θ.sizeptr isptr(t) ∨ isfunptr(t)
n · sizeθ(t′) t = array(t′, n)
θ.sizestruct(tC) t = struct tC
5.1.8 Memory
We model the memory as a flat byte-addressable memory. In global memory we
map addresses to bytes. In the local memory we map variable names to byte-
124 CHAPTER 5. C-IL SEMANTICS
strings. In the same time C-IL values are typed bit-strings. In order to interpret
memory bytes as C-IL values and vice versa we define the following two functions.
Definition 5.20 I
Converting Values
The function val2bytesθ converts C-IL values to byte-strings.
val2bytesθ :: val ⇀ (B8)∗
val2bytesθ(v)
def
=

bytes(b) v = val(b, t) ∧ θ.endianness = little
rev(bytes(b)) v = val(b, t) ∧ θ.endianness = big
undefined otherwise
The function bytes2valθ converts byte-strings to C-IL values.
bytes2valθ :: (B8)∗ × T⇀ val
bytes2valθ(B, t)
def
=
val(bits(B), t) t 6= struct tC ∧ θ.endianness = little
val(bits(rev(B)), t) t 6= struct tC ∧ θ.endianness = big
undefined otherwise
The functions bits and bytes convert byte-strings to bits-strings and bits-strings
to byte-strings respectively. The function rev reverses the order in the strings. In
dependence of the endianness of the underlying architecture we possibly reverse
the order of bytes.
Definition 5.21 I
Reading from
Global Memory
The function readM reads a byte-string of length s from a global memoryM
starting at address a.
readM :: (Bgm → B8)× B8·sizeptr × N→ (B8)∗
readM(M, a, s) def=
{
readM(M, a+8·sizeptr 08·sizeptr−11, s− 1) ◦M(a) s > 0
ε s = 0
Definition 5.22 I
Writing to Global
Memory
The function writeM writes a byte-string B to global memory M starting at
address a.
writeM :: (Bgm → B8)× B8·sizeptr × (B8)∗ → (Bgm → B8)
writeM(M, a, B)(x) =
{
B[〈x〉 − 〈a〉] 〈x〉 − 〈a〉 ∈ {0, . . . , |B| − 1}
M(x) otherwise
Definition 5.23 I
Reading from
Local Memory
The function readME reads a byte-string of length s from local memory ME
of variable v starting at offset o.
readME :: (V⇀ (B
8)∗)× V× N× N⇀ (B8)∗
readME (ME , v, o, s)
def
= ME(v)[o+ s− 1] ◦ . . . ◦ME(v)[o]
5.1. SEQUENTIAL C-IL SEMANTICS 125
If s + o > |ME(v)| or v /∈ dom(ME), the function is undefined for the given
parameters.
J Definition 5.24
Writing to Local
Memory
The function writeME writes byte-string B to variable v of local memoryME
starting at offset o.
writeME :: (V⇀ (B
8)∗)× V× N× (B8)∗ ⇀ (V⇀ (B8)∗)
writeME (ME , v, o, B)(w)[i]
def
=
{
B[i− o] w 6= v ∨ i ∈ {o, . . . , o+ |B| − 1}
ME(w)[i] otherwise
If |B| + o > |ME(v)| or v /∈ dom(ME), the function is undefined for the
given parameters.
J Definition 5.25
Reading from a
C-IL
Configuration
The function read returns for a given C-IL configuration c and a given C-IL
context θ the value from the memory addressed by the pointer x.
readθ :: CC -IL × val ⇀ val
readθ(c, x)
def
=
bytes2valθ(readM(c.M, a, sizeθ(t)), t) x = val(a,ptr(t))
bytes2valθ(readME (c.s[i].ME(v), o, sizeθ(t)), t) x = lref((v, o), i,ptr(t))
readθ(c, val(a,ptr(t))) x = val(a, array(t, n))
readθ(c, lref((v, o), i,ptr(t))) x = lref((v, o), i, array(t, n))
undefined otherwise
J Definition 5.26
Writing to a C-IL
Configuration
The function write writes the value y to the memory of a C-IL configuration
c in a context θ at the address defined by pointer x.
writeθ :: CC -IL × val × val ⇀ CC -IL
writeθ(c, x, y)
def
=
c[M := writeM(c.M, val2bytesθ(x), x = val(a,ptr(t))
val2bytesθ(y))] ∧ y = val(b, t)
c′ x = lref((v, o), i,ptr(t))
∧ y = val(b, t)
writeθ(c, val(a,ptr(t)), y) x = val(a, array(t, n))
writeθ(c, lref((v, o), i,ptr(t)), y) x = lref((v, o), i, array(t, n))
undefined otherwise
126 CHAPTER 5. C-IL SEMANTICS
where c′ differs from c only in the local memory of the i-th stack frame.
c′.s[i].ME = writeME (c.s[i].ME , v, o, val2bytesθ(y))
5.1.9 Auxiliary Definitions
In the following lines we define some auxiliary functions and notations, which
we later use for computing the next C-IL configuration in the execution of C-IL
programs.
We index the frames on the stack of a C-IL configuration, e.g. the first frame
is indexed by zero and the top-most frame is indexed by the length of the stack
minus one. In the subsequent definitions we assume a non-empty stack. Only for
a non-empty stack the auxiliary functions are total and well-define.
Definition 5.27 I
Accessing Frame
Elements
To easily talk about components of the frame i ∈ [0 : (|c.stack| − 1)] of the
C-IL configuration c ∈ CC -IL we define the following notation:
c.Mi def= c.s[i].ME
c.rdsi
def
= c.s[i].rds
c.fi
def
= c.s[i].f
c.loci
def
= c.s[i].loc.
For the top-most frame on the stack we additionally define:
c.Mtop def= c.s[|c.s| − 1].ME
c.rdstop
def
= c.s[|c.s| − 1].rds
c.ftop
def
= c.s[|c.s| − 1].f
c.loctop
def
= c.s[|c.s| − 1].loc.
Definition 5.28 I
Topmost Frame
Index
The function top returns the index of the top-most stack frame in a given
C-IL configuration.
top(c ∈ CC -IL) ∈ N def= |c.s| − 1
Definition 5.29 I
Removing the
Topmost Frame
The function dropframe removes the top-most stack frame from a C-IL con-
figuration.
dropframe(c ∈ CC -IL) ∈ CC -IL def= c[s 7→ c.s[0 : |c.s| − 2]]
5.1. SEQUENTIAL C-IL SEMANTICS 127
J Definition 5.30
Incrementing
Location Counter
The function incloc increments the location counter of the top-most frame.
incloc(c ∈ CC -IL) ∈ CC -IL def= c[loctop 7→ c.loctop + 1]
J Definition 5.31
Setting Location
Counter
The function setloc assigns a given value to the location counter of the top
most stack frame.
setloc(c ∈ CC -IL, l ∈ N) ∈ CC -IL def= c[loctop 7→ l]
J Definition 5.32
Next Statement
The function stmtnext computes the next statement to be executed.
stmtnext(c ∈ CC -IL, pi ∈ progC-IL) ∈ S def= pi.F(c.ftop).P[c.loctop]
J Definition 5.33
Setting Return
Destination
The function setrds sets the return destination field of the topmost stack frame
to a given value v.
setrds(c ∈ CC -IL, v ∈ valptr ∪ vallref ∪ {⊥}) ∈ CC -IL def= c[rdstop 7→ v]
J Definition 5.34
Creating a New
Frame
The function framenew creates a new stack frame and puts it on the top of
the stack.
framenew(c ∈ CC -IL, pi ∈ progC-IL, f ∈ Fname) ∈ CC -IL def=
c[s 7→ (M′E ,⊥, f, 0) ◦ c.s]
whereM′E has to offer enough space for the local variables and the parameters
of f .
V ′ = pi.F(f).V
npar′ = pi.F(f).npar
∀i ∈ [npar′ : |V ′| − 1]. V ′[i] = (vi, ti) =⇒ |M′E(vi)| = sizeθ(ti)
∀i ∈ [0 : npars′ − 1].M′E(vi) = val2bytesθ([E[i]]pi,θc )
J Definition 5.35
Is-function
The predicate is-function denotes whether a given function pointer is valid,
i.e. the pointer corresponds to a function name.
is-function(v ∈ valfptr, f ∈ Fname , θ ∈ contextC-IL) ∈ B def=
v = val(b, fptr(t, T )) ∧ θ.Faddr(f) = b
∨ v = fun(f, fptr(t, T ) ∧ f ∈ dom(pi.F))
128 CHAPTER 5. C-IL SEMANTICS
5.1.10 Operational semantics
The C-IL operational semantics presented in this section define the execution of C-
IL programs. The transition from one C-IL configuration to the next we define with
a case distinction on the next statement. Important part of the semantics is the
evaluation of expressions. Expressions are evaluated to C-IL values in dependence
of current C-IL configuration c ∈ CC -IL, the given program pi ∈ progC-IL and
a context θ ∈ contextC-IL. In expression evaluation the type of the evaluated
expression must be determined. The function
τpi,θc :: E⇀ TQ
returns to a given expression its type.
The expression evaluation function
[ ]pi,θc :: E⇀ val
returns the value of a given expression.
Both functions are defined by case split on the different expressions from
Definitions 5.9 and by structural induction on the evaluated expression. The
functions are partial and return ⊥ for expressions which can not be evaluated
correctly, e.g. pointer dereferencing ∗(e) can be evaluated only if e is of pointer
type or an array.
Neither the simulation relation between MIPSP and C-IL nor the extensions
to the C-IL semantics that we introduce in the next chapter require changes in
the expression evaluation. Thus we omit here the formal definitions. They can be
found in [SS12] and [Sch13].
Depending on the type of the next statement one of the following rules defines
the next step in the execution of a C-IL program. In the next definitions, when
we say location counter, we refer to the location counter of the top most stack
frame.
Definition 5.36 I
Assignment
The execution of an assignment statement consists of writing the value of
the right hand expression at the location defined by the left hand expression and
increasing the location counter.
stmtnext(c, pi) = (e0 = e1)
pi, θ ` c→ incloc(writeθ(c, [&e0]pi,θc , [e1]pi,θc ))
Definition 5.37 I
Goto
The execution of a goto statement consists of updating the value of the current
location counter with the provided value.
stmtnext(c, pi) = goto l
pi, θ ` c→ setloc(c, l)
Definition 5.38 I
If-Not-Goto
(success)
The execution of an if-not-goto statement depends on the evaluation of the
condition expression. If the expression e evaluates to zero we set the location
5.1. SEQUENTIAL C-IL SEMANTICS 129
counter to the provided value l.
stmtnext(c, pi) = ifnot e goto l zeroθ([e]
pi,θ
c )
pi, θ ` c→ setloc(c, l)
J Definition 5.39
If-Not-Goto
(failure)
If the expression e does not evaluate to zero we increase the location counter.
stmtnext(c, pi) = ifnot e goto l ¬zeroθ([e]pi,θc )
pi, θ ` c→ incloc(c)
J Definition 5.40
Function Call or
Procedure Call
The execution of a function call statement (with or without return value) is
defined if the expression e evaluates to a function f , f is not an external function,
the local memory of the created frame offers enough space for local variables
and parameters, expression E is a valid parameter list according the function
declaration. The execution of the statement pushes the new stack frame on the
stack and increases the location counter in the old topmost frame.
stmtnext(c, pi) = call e(E) ∨ stmtnext(c, pi) = (e0 = call e(E))
is-function([e]pi,θc , f) θ.F(f).P 6= extern
pi, θ ` c→ framenew(incloc(setrds(c, rds)), pi, f)
where rds′ denotes the return destination for the function call.
rds =
{
⊥ stmtnext(c, pi) = call e(E)
[&e0]
pi,θ
c stmtnext(c, pi) = (e0 = call e(E))
J Definition 5.41
Function Return
with Result
The execution of a return statement with return value and return destination
is defined by dropping the top stack frame and writing the return value to the
return destination.
stmtnext(c, pi) = return e ∧ c.rdstop(c)−1 6= ⊥
pi, θ ` c→ writeθ(setrds(dropframe(c),⊥), dropframe(c).rdstop, [e]pi,θc )
J Definition 5.42
Function Return
without Result
The execution of a return statement without return value or without return
destination is defined by dropping the top stack frame.
stmtnext(c, pi) = return ∨ (stmtnext(c, pi) = return e ∧ c.rdstop(c)−1 = ⊥)
pi, θ ` c→ dropframe(c)
J Definition 5.43
External
Procedure Call
The execution of a external procedure call statement is defined if the expression
e evaluates to a function f , f is an extern function, the expression E is valid
parameter list according the function declaration, the transition relation for extern
functions defines the transition from c to c′ under the parameters E, in the new
130 CHAPTER 5. C-IL SEMANTICS
configuration c′ all stack frames but the topmost one are unchanged, the location
counter in topmost frame is increased and the function stays the same.
stmtnext(c, pi) = call e(E) is-function([e]
pi,θ
c , f) θ.F(f).P = extern
(([E0]
pi,θ
c , · · · , [E|E|−1]pi,θc ), c, c′) ∈ θ.Rextern(f)
pi, θ ` c→ c′
In the next lines we present the transition relation for the external functions
that we define.
Definition 5.44 I
rmw Transition
Relation
The definition of the compiler intrinsic read-modify-write has two cases. The
difference is that if the compare operation is successful we additionally write the
exchange value in the destination memory.
((rds, dest, cmp, exchng), c, c′) ∈ θ.Rextern(rmw) =⇒
readθ(c, dest) 6= cmp ∧ c′ = incloc(writeθ(c, rds, readθ(c, dest)))
∨ readθ(c, dest) = cmp
∧ c′ = incloc(writeθ(writeθ(c, rds, readθ(c, dest)), dest, exchng))
Definition 5.45 I
vmrun Transition
Relation
A function call to the intrinsic vmrun appears on the C-IL level as a NOOP.
(⊥, c, c′) ∈ θ.Rextern(vmrun) =⇒ c′ = incloc(c)
Definition 5.46 I
sendipi Transition
Relation
The external function sendipi is implemented in MIPSP assembler and writes
to the APIC interrupt command register. The implementation consists of several
instructions and the last is a store word.
The written word data ∈ B32 is defined according the ICR layout and Software
Condition 1 as follows:
data.DEST = 08
data.reserved = 04
data.DSH = 12
data.reserved = 05
data.DS = 1
data.DM = 0
data.MT = 03
data.V EC = 08.
The written address ICRa = 1
2009100 is computed from the APIC base
address and the ICR offset.
5.2. CC-IL SEMANTICS 131
(⊥, c, c′) ∈ θ.Rextern(sendipi) =⇒ c′ = incloc(writeθ(c, ICRa , data))
For the moment we consider sendipi as an ordinary memory write. In the next
chapter we will extend its semantics and model its effect in the MIPSP machine.
5.2 CC-IL Semantics
For the execution of several C-IL programs in a multiprocessor system we need
extended semantics. We define the Concurrent C-IL semantics to talk about
several C-IL threads running at the same time. Every thread has an unique ID
tid ∈ Tid (Tid ⊂ N).
J Definition 5.47
Concurrent C-IL
configuration
A concurrent C-IL configuration consists of a single shared global memory and
multiple stacks, i.e. one per each thread.
CCC -IL
def
= [M∈ Bgm 7→ B8, s ∈ Tid 7→ frame∗C-IL]
From the configuration c ∈ CCC -IL we can construct a sequential configuration
c(t) ∈ CC -IL for given thread with an ID t ∈ Tid.
c(t)
def
= (c.M, c.s(t))
J Definition 5.48
CC − IL
Operational
Semantics
A CC -IL execution is the composition of threads’ steps interleaved at state-
ments. The threads execute C-IL steps and can access both their local resources
(stack) and the shared memory.
pi, θ ` c(t)→ (M′, s′)
c′ = (M′, c.s[t 7→ s′])
pi, θ ` c→ c′
5.2.0.1 Auxiliary Functions and Notation
We denote a single CC -IL step by
pi, θ ` c→ c′.
In some case we want to be precise and say that the given CC -IL step is made
by given thread t
pi, θ ` c→t c′.
A sequence of CC -IL steps of thread t we denote by
pi, θ ` c→∗t c′.
132 CHAPTER 5. C-IL SEMANTICS
Definition 5.49 I
Accessing Frame
Elements
Since in the concurrent configuration we have several stacks we have to refine
our notation for accessing frame elements. To refer to components of the i-th
frame of given stack s (i ∈ [0 : (|s| − 1)]) we define:
s.Mi def= s[i].ME
s.rdsi
def
= s[i].rds
s.fi
def
= s[i].f
s.loci
def
= s[i].loc.
For the top-most frame on the stack we additionally define:
s.Mtop def= s[|s| − 1].ME
s.rdstop
def
= s[|s| − 1].rds
s.ftop
def
= s[|s| − 1].f
s.loctop
def
= s[|s| − 1].loc.
5.3 Compiler Correctness
In this work we talk about execution of assembler code generated by a compiler
for a given C-IL program. We want to verify properties on the C-IL level and
transfer them to the MIPSP level. Thus we want to define a simulation relation
between both. Then we can state that computations of the C-IL machine simulate
computations of the MIPSP machine. Having this we can prove properties for the
C-IL computations which will hold for the MIPSP computations. The compiler
consistency is the simulation relation between the C-IL program and the generated
instruction sequence.
Compiler consistency for C-IL and MIPS -86 in the absence of interrupts has
been examined and a formal definition has been given by Andrey Shadrin [Sha12]
building on previous work by W. J. Paul and others [LPP05a, DPS09].
We note that in the context of a hypervisor program, the compiler correctness
theorem talks only about the execution of the hypervisor code. MIPSP processors
execute compiled hypervisor code in system mode . In guest mode MIPSP pro-
cessors execute guest steps, while the hypervisor C-IL machine is not proceeding.
If we think of an extended MIPSP in which guest steps also execute instructions,
obviously the processor state will change during guest mode. Thus guest execu-
tion would break the consistency between the hypervisor C-IL machine and the
MIPSP configuration. The consistency has to be defined in dependence on the
processor mode. In guest mode instead of the processor registers one should look
up the saved kernel processor state in the memory. We omit the case split in
the consistency relation here, since MIPSP does not provide detailed model for
guest executions and the switch between kernel context and guest context. Still
5.3. COMPILER CORRECTNESS 133
we state the compiler correctness theorem excluding guest steps. This way it is
applicable not only on MIPSP but also on a an MIPSP extended version, which
provides more functionality in guest mode.
In this chapter we define compiler consistency close to [Sha12]. In the next
chapter we extend the C-IL semantics by inter-processor interrupts, extend the
compiler correctness theorem for the corresponding interrupt cases, and prove it.
5.3.1 Consistency Points
Non optimizing compilers translate every C-IL step into one ore more assembler
steps. A step function maps the C-IL steps numbers to assembler steps numbers.
For non optimizing compilers consistency holds for every C-IL step.
For an optimizing compiler consistency holds only at several points of the
execution which we call consistency points. The consistency points are defined
on both levels C-IL and MIPSP . The compiler defines the set of C-IL consistency
points and the corresponding MIPSP states. Additionally on the MIPS level we
have more consistency points than on the C-IL level. So more than one hardware
configuration may be consistent with a single software configuration.
Defining the set of consistency points is an art itself and depends strongly
on the particular compiler and consistency relation. The more consistency points
a given compiler guarantees the less optimizations on the generated code it can
do. In this work we define a set of consistency points, which is sufficient for our
proofs. It satisfies the conditions of our reordering theorem and allows us to prove
compiler correctness theorem inductively.
5.3.1.1 Software Consistency Points
The C-IL states where consistency holds we call software consistency points. A
C-IL state is a consistency point if the next statement is:
• a function call1,
• the first statement in a given function,
• a return statement,
• a statement performing a shared memory access,
• the first statement after a function call,
• the first statement after a statement performing a shared memory access.
1The compiler sets a consistency point before/after function calls to C-IL functions, external
functions and compiler intrinsics.
134 CHAPTER 5. C-IL SEMANTICS
Definition 5.50 I
C-IL Consistency
Locations
Given a program pi and a context θ the predicate cp denotes whether a given
location loc in the execution of function f is a consistency point.
cppi,θ(f ∈ Fname, loc ∈ N) ∈ B def= (loc = 0)
∨ pi.F(f).P[loc] ∈ {return, return e}
∨ pi.F(f).P[loc] ∈ {e0 = call e(E), call e(E)}
∨ pi.F(f).P[loc− 1] ∈ {e0 = call e(E), call e(E)}
∨ volpi,θc (pi.F(f).P[loc− 1])
Definition 5.51 I
C-IL Consistency
Point
The predicate CPpi,θC-IL denotes whether given configuration c is a consistency
point.
CPpi,θC-IL(c ∈ CC -IL) ∈ B
def
= cppi,θ(c.ftop, c.loctop)
If we consider a function call to an external function and relate it to the set
of C-IL consistency point, we can conclude that on the C-IL level function calls
to external functions are bordered by consistency points and there is no other
consistency point in-between.
Thus following software condition arises.
Software Condition 8 (Consistent External Functions) All external functions
preserve compiler consistency. External functions do not access shared data.
5.3.1.2 MIPSP Consistency Points
We define the MIPSP consistency points to overlap with the interleaving points
defined in Chapter 3. We recall that we distinguish among different types of
interleaving points.
In Chapter 3 we already mentioned the set of instruction addresses Acp . It
contains pointers to the first instruction to be executed after a consistency point
that originates from the compiled program. In the absence of interrupts, all
hypervisor interleaving points are defined by the compiler by Acp .
Definition 5.52 I
Hypervisor
Consistency Point
of Processor i
The predicate hypCPC-IL denotes whether the k-th step in the execution
h
β→ h′ is a hypervisor consistency point for processor i.
hypCPC-IL(h ∈ CM , i ∈ Pid , β ∈ (ΣM )∗, k ∈ N) ∈ B def=
k ∈ [0 : |β| − 1] ∧ i = βk.pid ∧ core(βk) ∧ hk.pcβk.pid ∈ Acp
We recall that according the definitions in Chapter 3 each interleaving block in
a reordered MIPSP execution starts and ends at an interleaving point. Hypervisor
blocks, which we consider in our simulation theorem, start at hypervisor interleav-
ing point but may end at any kind of an interleaving point. The next hypervisor
5.3. COMPILER CORRECTNESS 135
block of the same processor starts then at another hypervisor interleaving point.
This implies that consistency holds at the beginning and at end of a hypervisor
block. Thus we also include the guest and IPI interleaving points in the set of
consistency points.
J Definition 5.53
MIPSP
Consistency Point
The predicate CPMIPSP denotes whether the k-th step in the execution h
β→ h′
is a consistency point.
CPMIPSP (h ∈ CM , β ∈ (ΣM )∗, k ∈ N) ∈ B
def
=
k ∈ [0 : |β| − 1] ∧ ∃i ∈ Pid . hypCPC-IL(h, i, β, k) ∨ guestIP(h, β, k) ∨ ipi(βk−1)
Despite the fact that each interleaving block in a reordered MIPSP execution
is bordered by interleaving points, still it might be that in the considered execution
some threads do not end in a consistent state. Therefore we have to distinguish
two cases for given consistency point h. Threads that are running, i.e. processing
steps, after h have been interleaved properly at their own last consistency point.
Threads that are not running after h are possibly interrupted before reaching their
own consistency point.
J Definition 5.54
Running Thread
The predicate running denotes that a given processor i will perform further
steps after k in the given transition sequence.
running i(β, k)
def
= ∃l ∈ [k : |β| − 1] : βl.pid = i
Our compiler correctness theorem is stated inductively on interleaving blocks
between two consistency points. Thus we need to identify the next consistency
point in a execution.
J Definition 5.55
Next Consistency
Point
The function nextCP returns the index of the next consistency point in the
execution sequence h
β→ h′ starting from configuration hk.
nextCP(h ∈ CM , β ∈ (ΣM )∗, k ∈ N) ∈ N ∪ ⊥ def={
m CPMIPSP (h, β,m) ∧ ∀l ∈ [k + 1,m− 1] : ¬CPMIPSP (h, β, l)
⊥ otherwise.
5.3.2 Compiler Information
In order to formalize the compiler correctness theorem we need to examine some
details of the compiler and the code generation function. We need to know where
local variables and function parameters are saved on the stack. A compiler calling
convention defines how functions receive their parameters and return a result.
Furthermore the compiler assigns to some GPRs special functions, e.g. stack
136 CHAPTER 5. C-IL SEMANTICS
Index Alias Usage
0 zero always zero
1 . . . 13 t0 . . . t12 temporary
14 rv return value
15 . . . 18 i0 . . . i3 input arguments
19 . . . 28 t13 . . . t22 temporary
29 rsp stack pointer
30 rbp stack frame base pointer
31 t23 temporary
Table 5.1: Special Usage of GPRs
pointer. We stick to the calling convention from [Sha12]. In Table 5.1 we list
registers alongside their function and alias.
Furthermore, we introduce two disjoint lists of register indexes caller and
callee. The exact separation of registers into these two lists is for us an irrelevant
detail. The important fact is, that together caller and callee contain all registers
except zero, rsp, rbp.
Definition 5.56 I
Calling
Convention
We define the following calling convention:
• All GPR registers from caller are caller saved.If the caller relies on their
values it has to save them, because the callee may change them.
• All GPR registers from callee are callee saved.The callee guarantees to the
caller that after return this registers have the same values as before the call.
• A Function’s result is always passed through register rv.
• The first four parameters of a given function are passed to it by its caller in
registers i0, . . . , i3.
• If a given function has more than four parameters, then all parameters
except the first four are pushed by the caller on the stack.
• The stack pointer rsp points to the first byte of the last occupied word of
the topmost stack frame.
• The stack frame base pointer rbp points to the first byte of the previous
base pointer field of the topmost stack frame.
In Figure 5.1 we present the stack layout. For every called function a function
frame is allocated on the stack, which grows downwards. Not only the number of
frames on the stack but also the frame’s layout changes dynamically on function
calls.
5.3. COMPILER CORRECTNESS 137
Stack 
Frame i
RSP
RBP RBPi Stack 
Frame i+1
Low Memory
High Memory
Stack Base
. . .
RBPi-1
Figure 5.1: C-IL stack layout. The stack is growing downwards.
All frames offer place for function arguments, the return address, the base
pointer of the previous frame, callee save registers, local variables and tempo-
raries(Figure 5.2). Additionally some frames contain caller save registers. These
frames belong to functions which are executing a function call statement and are
waiting for a callee to return. For simplicity we assume that always all callee save
registers are stored on the stack. The region allocated for function parameters
offers space for all parameters, despite the fact that the first four parameters are
passed in registers.
The allocation and the deallocation of stack space is done both on the caller
and on the callee side. This is done on the one hand by pre-call-code and post-
call-code generated for function calls on the caller side (5.3). On the other
hand the compiler generates for every function f a prolog code and an epilog
code. The prolog and the epilog are executed by the callee. First the pre-call-
code pushes caller save registers on the stack, allocates the parameter region,
stores parameters (if more then four) and pushes the return address on the stack.
Then prolog allocates further stack space and sets up the frame for the function
execution. At the end of the function the epilog deallocates the frame of the
function, restores the base pointer and the program counter. At that point the
138 CHAPTER 5. C-IL SEMANTICS
caller save registers are still on the stack and the post-call-code restores them and
deallocates the corresponding stack region.
Definition 5.57 I
Static Compiler
Information
CC -IL
We define the record infoCC -IL to consist of the static information about the
compiled program.
infoCC -IL
def
= [code ∈ (B32)∗,
cba ∈ B8·sizeptr ,
csize ∈ N,
stmtcode ∈ Fname × N→ B8·sizeptr ,
pcode ∈ Fname → B8·sizeptr ,
pcodesize ∈ Fname → N,
ecode ∈ Fname → B8·sizeptr ,
ecodesize ∈ Fname → N,
pccode ∈ Fname × N→ B8·sizeptr ,
stackba ∈ Tid→ B8·sizeptr ,
stacksize ∈ N,
fsize ∈ Fname × N→ N,
fsizet ∈ Fname × N→ N,
fsizep ∈ Fname → N,
fsizev ∈ Fname → N,
lvarnum ∈ Fname → N,
lvarreg ∈ V× Fname × N→ B5 ∪ ⊥,
lvaroff ∈ V× Fname → N,
paroff ∈ V× Fname → N,
regoff ∈ Fname × N× B5 → N,
cp ∈ Fname × N→ B]
• info.code the compiled code.
• info.cba code region base address.
• info.csize code size in bytes.
• info.stmtcode maps a function name and a location to the address of the
first instruction of the compiled statement.
• info.pcode maps a function name to the address of the first instruction of
the function’s prolog.
5.3. COMPILER CORRECTNESS 139
oldRBP / frame pointer
oldPC / return address
Callee Save Registers
Local Variables
Function Arguments
Caller Save Registers 
Caller
Function
Stack 
Frame
Callee
Function
Stack 
Frame
RSP
RBP oldRBP / frame pointer
Callee Save Registers
Local Variables
RSP
RBP
oldPC / return address
Function Arguments
Temporaries Temporaries
Figure 5.2: C-IL stack frame layout. The stack is growing downwards.
• info.pcodesize maps a function name to the size in bytes of the function’s
prolog.
• info.ecode maps a function name to the address of the first instruction of
the function’s epilog.
• info.ecodesize maps a function name to the size in bytes of the function’s
epilog.
• info.pccode maps a function name and a location of a function call in it to
the address of the first instruction of the post-call-code of the function call.
• info.stackba returns the stack base address for a given thread.
• info.stacksize maximum stack size in bytes.
• info.fsize maps a function and a location to a number of bytes allocated
on the stack for the function’s frame.
• info.fsizet maps a function and a location to the number of bytes allocated
on the stack for temporaries.
• info.fsizep maps a function to a number of bytes allocated on the stack for
its parameters.
• info.fsizev maps a function to the number of bytes allocated on the stack
for its local variables.
• info.lvarnum maps a given function to the number of its local variables.
• info.lvarreg maps the local variables of a given function and a location to
the register index where they are stored.
140 CHAPTER 5. C-IL SEMANTICS
Prolog
Epilog
Function body code
Pre-call
Post-call
Function call
Caller
Callee...
...
Figure 5.3: Function call code.
• info.lvaroff maps the local variables(excluding parameters) of a given func-
tion to their offset in bytes within the corresponding stack frame (below the
frame base address).
• info.paroff maps the parameters of a given function to their offset in bytes
within the corresponding stack frame (above the frame base address).
• info.regoff maps a function name, a location and an index of a callee save or
a caller save register to an offset in bytes according the frame base pointer.
The callee save registers, which contain local variables of fi are saved in the
frame i + 1, thus the offset defined by info.regoff for callee save registers
reaches the next frame.
• info.cp denotes the consistency points for a given function and a program
location.
Definition 5.58 I
Static Compiler
Information C-IL
Since in all the cases we consider only the stack of a single thread we define
infoC-IL a single-thread version of infoCC -IL with the only difference that it stores
a single stack base address.
Definition 5.59 I
Static Compiler
Information
Transformation
The function sinfo returns for a given thread and a given concurrent static
information the corresponding single-thread static information.
5.3. COMPILER CORRECTNESS 141
pbpC-IL(top(s)-1)
raC-IL(top(s))
Callee Save Registers
Local Variables
Function Arguments
Caller Save Registers 
Frame 
top(s)-1
Frame 
top(s)
baC-IL(top(s)-1)
pbpC-IL(top(s))
Callee Save Registers
Local Variables
RSP
baC-IL(top(s))
raC-IL(top(s))
Function Arguments
Temporaries
Temporaries
distC-IL
distC-IL
Figure 5.4: C-IL stack frame distance.
sinfo :: Tid× infoCC -IL → infoC-IL
sinfo(t, info).xxx =
{
info.xxx if xxx 6= stackba
info.stackba(t) otherwise
J Definition 5.60
Base Pointer
Distance
The function distC-IL returns the number of bytes between the base pointers of
frames on the stack s (Figure 5.4). For the topmost frame it returns the number
142 CHAPTER 5. C-IL SEMANTICS
of bytes between the base pointer and the stack pointer.
distC-IL(i ∈ N, s ∈ frame∗C-IL, info ∈ infoC-IL) ∈ N
def
={
info.fsizev (s.fi) + 4 ∗ |callee|+ info.fsizet(s.fi , s.loci) if i = top(s)
info.fsize(s.fi, s.loci)− info.fsizep(s.fi) + info.fsizep(s.fi+1) otherwise
Definition 5.61 I
Base Address
The function baC-IL returns the base address of the i-th frame on the stack s.
baC-IL(i ∈ N, s ∈ frame∗C-IL, info ∈ infoC-IL) ∈ B32
def
=info.stackba − bin32(info.fsizep(s.fi)− 4) if i = 0baC-IL(0, s, info)− bin32(j<i∑distC-IL(j, s, info)) otherwise
Definition 5.62 I
Return Address
The function raC-IL returns the return address of the i-th frame on the stack
s.
raC-IL(i ∈ N, s ∈ frame∗C-IL,m ∈ B32 → B8, info ∈infoC-IL) ∈ B32
def
=
m4(baC-IL(i, s, info) + 4)
Definition 5.63 I
Previous Base
Pointer
The function pbpC-IL returns for given frame i on the stack s the base address
of the previous frame.
pbpC-IL(i ∈ N, s ∈ frame∗C-IL,m ∈ B32 → B8, info ∈infoC-IL) ∈ B32
def
=
m4(baC-IL(i, s, info))
5.3.2.1 Memory Layout
Definition 5.64 I
Code Region
With the compiler information we can now define the code region in the memory.
CR(info ∈ infoCC -IL) ∈ 2B32 def= [info.cba : info.cba + bin32(info.csize − 1)]
Definition 5.65 I
Stack Region
The stack region of thread i is defined by the thread’s stack base address and
the maximum stack size.
StR(i ∈ Tid, info ∈infoCC -IL) ∈ 2B32 def=
[info.stackba(i)− bin32(info.stacksize + 1) : info.stackba(i)]
Definition 5.66 I
Code Region
Address
The predicate cr denotes whether given address a belongs to the region, where
the code resides.
cr(a ∈ B8·sizeptr , info ∈ infoCC -IL) ∈ B def= a ∈ CR(info)
5.3. COMPILER CORRECTNESS 143
J Definition 5.67
Stack Region
The predicate sr denotes whether given address a belongs to the local stack
memory of some thread.
sr(a ∈ B8·sizeptr , info ∈ infoCC -IL) ∈ B def= ∃i ∈ Tid. a ∈ StR(i, info)
5.3.3 Compiler Consistency
The compiler consistency relation comprises the coherency between the sub-
components of given C-IL and MIPSP configurations, the C-IL program and the
compiled code.
• Control consistency defines the values of the program counter on the one
hand and the return address of the function frames on the other hand.
• Code consistency states that the compiled program is placed in a memory
region disjoint from the data, and that the compiled program corresponds
to the original code.
• Stack consistency talks about the local memory region where the stack
resides, the stack and base pointer and registers that store local variables.
All stack frames are properly stored and follow the calling convention.
• Memory consistency talks about the shared global memory of the program,
e.g. the global variables.
Defining compiler consistency we distinguish global and local properties.
• The global consistency consists of the memory consistency and the code
consistency.
• The local consistency consists of the control consistency and the stack
consistency.
The code consistency states that the program is stored at the proper memory
region and is disjoint from the data. In the absence of self modifying code this is a
read only region of the memory. For simplicity we omit here the formal definition
and only declare the function consiscodeCC -IL, which denotes code consistency.
consiscodeCC -IL(info ∈ infoCC -IL,m ∈ B32 → B8) ∈ B
J Definition 5.68
Memory
Consistency
The memory consistency defines the equivalence of the global data memory
in both machines. We denote it with the predicate:
consismemCC -IL(info ∈ infoCC -IL,M∈ Bgm → B8, h ∈ CM ) ∈ B def=
∀a ∈ B8·sizeptr . ¬(sr(a, info) ∨ cr(a, info) ∨ a ∈ Aipcb) =⇒ M(a) = h.m(a)
144 CHAPTER 5. C-IL SEMANTICS
Definition 5.69 I
Global
Consistency
The predicate consisglobalCC -IL denotes the global consistency.
consisglobalCC -IL(c ∈ CCC -IL, info ∈ infoCC -IL, h ∈ CM ) ∈ B
def
=
consiscodeCC -IL(info, h.m)
∧ consismemCC -IL(info, c.M, h)
Definition 5.70 I
Control
Consistency
The predicate consiscontrolC-IL denotes control consistency. It states that the value
of the program counter is the address of the current C-IL instruction. The value
of the return address in all stack frames is the address of the first post-call-code
instruction after the function call to the topmost function.
consiscontrolC-IL (s ∈ frame∗C-IL, info ∈ infoC-IL, core ∈ CCORE ,m ∈ B32 → B8)
def
=
core.pc = info.stmtcode(s.ftop, s.loctop)
∧ ∀i ∈ N.0 < i < |s| =⇒
m4(raC-IL(i, s,m, info)) = info.pccode(s.fi−1, s.loci−1)
Definition 5.71 I
Register
Consistency
The predicate consisregsC-IL denotes register consistency. It states that the value
of the stack pointer register rsp refers to the top of the stack and the value of the
frame base pointer register rbp refers to the base address of the topmost frame.
consisregsC-IL(s ∈ frame∗C-IL, info ∈ infoC-IL, core ∈ CCORE ) ≡
core.gpr(rbp) = baC-IL(top(s), s, info)
∧ core.gpr(rsp) = baC-IL(top(s), s, info)− bin32(distC-IL(top(s), s, info))
Definition 5.72 I
Previous Base
Pointer
Consistency
The predicate consispbpC-IL denotes previous base pointer consistency. It states
that the first four bytes of every stack frame contain the base address of the
preceding stack frame.
consispbpC-IL(s ∈ frame∗C-IL, info ∈ infoC-IL,m ∈ B32 → B8) ∈ B
def
=
∀i ∈ N. 0 < i < |s| =⇒ pbpC-IL(i, s,m, info) = baC-IL(i− 1, s, info)
Definition 5.73 I
Local Variables
Consistency
The predicate consisvarC-IL denotes local variables consistency. It states that the
local variables and parameters are stored properly. The variables that are kept in
registers according the compiler information are stored in the registers only for
the topmost frame. For the other frames they are on the stack either in the callee
save or in the caller save regions of the corresponding frame.
5.3. COMPILER CORRECTNESS 145
consisvarC-IL(s ∈ frame∗C-IL, pi ∈ progC-IL, info ∈ infoC-IL, core ∈ CCORE ,
m ∈ B32 → B8) ∈ B def=
∀i, j ∈ N. i < |s| ∧ j < info.lvarnum(s.fi) =⇒ s.Mi(vj) =
core.gpr(reg) if reg 6= ⊥ ∧ i = top(s)
m4(bai − bin32(info.regoff (f, reg))) if reg 6= ⊥ ∧ i < top(s)
msizeθ(qt2t(tj))(bai + bin32(4 + info.paroff (vj , s.fi))) if reg = ⊥ ∧ j < npar
msizeθ(qt2t(tj))(bai − bin32(info.lvaroff (vj , s.fi))) if reg = ⊥ ∧ j ≥ npar
where
(vj , tj) = (pi.F(s.fi)).V[j]
reg = info.lvarreg(vj , s.fi, s.loci)
npar = (pi.F(s.fi)).npar
bai = baC-IL(i, s, info)
J Definition 5.74
Stack Consistency
The predicate consisstackC-IL denotes stack consistency as a wrapper for register,
previous base pointer and local variables consistency.
consisstackC-IL (s ∈ frame∗C-IL, info ∈ infoC-IL,
core ∈ CCORE ,m ∈ B32 → B8) ∈ B def=
consisregsC-IL(s, info, core)
∧ consispbpC-IL(s, info,m)
∧ consisvarC-IL(s, info, core,m)
J Definition 5.75
Local Consistency
The predicate consis localC-IL denotes local compiler consistency.
consis localC-IL (s ∈ frame∗C-IL, info ∈ infoC-IL,
core ∈ CCORE ,m ∈ B32 → B8) ∈ B def=
consiscontrolC-IL (s, info, core,m)
∧ consisstackC-IL (s, info, core,m)
J Definition 5.76
Compiler
Consistency
The predicate consisCC -IL denotes compiler consistency.
consisCC -IL(c ∈ CCC -IL, info ∈ infoCC -IL, h ∈ CM , i ∈ Pid) ∈ B def=
consisglobalCC -IL(c, info, h)
∧ consis localC-IL (c(i).s, info(i), h.corei, h.m)
146 CHAPTER 5. C-IL SEMANTICS
5.3.4 Safety
Next, we define a C-IL program discipline that allows us to prove simulation be-
tween the program and its MIPSP execution. Programs that follow this discipline
we call ownership safe. For a safe C-IL program the compiler must guarantee
safety of the MIPSP execution.
In our work we assume that the considered C-IL programs have the same
number of threads as the number of processors in the underlying MIPSP , i.e.
Tid = Pid . Defining ownership for threads we also recall that the layout of the
C-IL memory is the same as the layout of the MIPSP memory, i.e. both memories
are flat and byte addressable. Basically we define the ownership for C-IL threads
equivalent to the ownership from Section 3.2. We define C-IL ownership-safety
following the same policy but for a subset of the complete MIPSP address space
AMIPSP .
We recall that the set of read only addresses for MIPSP contains the code
region in addition to AroC-IL. The code region is not part of the C-IL memory.
Definition 5.77 I
C-IL Read Only
Memory
Thus the set of C-IL read only addresses is defined by the set of constant
variables.
AroC-IL = {a ∈ B32 | ∃(v, (q, t)) ∈ pi.VG. q ∈ const ∧ a ∈ Av}
where Av denotes the set of addresses occupied by the variable v
Av = [θ.allocgvar (v) : θ.allocgvar (v) + bin32(sizeθ(t)− 1)]
Furthermore the set of owned addresses for C-IL contains only global memory
addresses, since local variables are thread-local by default. We also have to exclude
the IPCB region from the C-IL ownership. That way we guarantee that the IPCB
addresses are constantly assigned to the corresponding processor. In the same
time we exclude accesses to IPCBs on the C-IL level.
Definition 5.78 I
C-IL Ownership
Address Space
According to these conditions in the ownership address space for C-IL we
exclude from AMIPSP the IPCB region, the code region and the stack region as
defined in Section 5.3.2.1.
AC-IL
def
= AMIPS \ (Acode ∪ Astack ∪ Aipcb)
Alongside the definitions of the static part of the ownership setting we require
in our simulation theorem that the dynamic part of the ownership settings on the
C-IL and on the MIPSP levels satisfy the following invariant.
Definition 5.79 I
Ownership
Consistency
Given C-IL ownership oc and MIPSP ownership om the predicate consis
O
states that:
• processor i owns the same addresses as thread i plus the corresponding
stack and IPCB regions,
5.3. COMPILER CORRECTNESS 147
• and the sets of shared addresses for C-IL and MIPSP are equal.
consisO(oc ∈ O, om ∈ O) ∈ B def=
∧ ownership-inv(oc)
∧ ownership-inv(om)
∧ oc.Ash = om.Ash
∧ ∀i ∈ Tid. (om.Oi \ oc.Oi = StR(i, info) ∪ Aipcb i)
In order to complete the ownership setting for C-IL, we have to define the set
of C-IL I/O steps, e.g. steps which are allowed to access shared data.
5.3.4.1 I/O Steps
We consider the following steps I/O steps:
• steps with accesses to volatile variables,
• the read-modify-write step,
• and the send-ipi step.
We define the following predicates to detect volatile accesses.
J Definition 5.80
Volatile
Expression
The predicate evolpi,θc denotes whether a given expression e contains volatile
accesses.
evolpi,θc (e ∈ E) ∈ B def=
volatile ∈ q if e ∈ V ∧ τpi,θc (e) = (q, t)
evolpi,θc (e0) if (∃⊕ ∈ O1. e = ⊕e0)
∨e = (t)e0 ∨ e = sizeof(e0)
∨e = &(∗(e0))
evolpi,θc (e0) ∨ evolpi,θc (e1) if ∃⊕ ∈ O2. e = e0 ⊕ e1
evolpi,θc (e0) ∨ evolpi,θc (e1) ∨ evolpi,θc (e2) if e = (e0 ? e1 : e2)
evolpi,θc (e0) ∨ volatile ∈ q if e = ∗(e0) ∧ τpi,θc (e0) = (q′,ptr(q, t))
∨e = (e0).f ∧ τpi,θc (e) = (q, t)
0 otherwise
148 CHAPTER 5. C-IL SEMANTICS
Definition 5.81 I
Volatile
Statement
The predicate volpi,θc denotes whether a given statement s contains volatile
accesses.
volpi,θc (s ∈ S) ∈ B def=
evolpi,θc (e0) ∨ evolpi,θc (e1) if s = (e0 = e1)
evolpi,θc (e) if s ∈ {ifnot e goto l, return e}
evolpi,θc (e0) ∨ evolpi,θc (e) ∨ ∃e′ ∈ E. evolpi,θc (e′) if s = (e0 = call e(E))
evolpi,θc (e) ∨ ∃e′ ∈ E. evolpi,θc (e′) if s = call e(E)
0 otherwise
In order to keep subsequent definitions concise we limit the accesses to volatile
data to the following cases:
• Volatile variables may only be accessed in assignment statements or by the
intrinsic function rmw.
• Per assignment we allow only one access to a volatile variable, i.e. either a
volatile read on the right hand side or a volatile write on the left hand side.
• In case of a volatile read, the right hand side of assignments is either a
volatile variable identifier, or it is dereferencing a pointer expression which
is either volatile or pointing to a volatile variable.
• In case of a volatile write, the left hand side of assignments is either a
volatile variable identifier, or it is dereferencing a pointer expression which
is either volatile or pointing to a volatile variable.
We implement these rules in our definition of the I/O steps. Thus every other
access to volatile data will not be safe.
Definition 5.82 I
C-IL I/O steps
The predicate iosteppi,θ denotes whether the next step in given C-IL configu-
ration c is an I/O step.
iosteppi,θ(c ∈ CC -IL) ∈ B def= stmtnext(c, pi) = call e(E) ∧ rmwpi,θc (e)
∨ stmtnext(c, pi) = (e = e′) ∧ volpi,θc (e) ∧ ¬volpi,θc (e′)
∧ (e ∈ V ∨ (e = ∗(e′′)) ∧ no2vol(e′′))
∨ stmtnext(c, pi) = (e = e′) ∧ volpi,θc (e′) ∧ ¬volpi,θc (e)
∧ (e′ ∈ V ∨ (e′ = ∗(e′′)) ∧ no2vol(e′′))
where
no2vol(e)
def
= ∃q, q′, t. τpi,θc (e) = (q′,ptr(q, t)) ∧ volatile /∈ q ∩ q′.
5.3. COMPILER CORRECTNESS 149
The number of accessed bytes during a C-IL step varies in dependence of the
type of the accessed data. For safety we need to identify the all accessed bytes
of the global memory. We have to examine the executed statement and define
recursively which memory addresses are accessed.
For a given configuration c, program pi and context θ, the functions
Rpi,θc :: S→ 2B
32
and
W pi,θc :: S→ 2B
32
return the reads set and the writes set of global addresses of the execution of a
given statement respectively. We omit here the formal definitions. The set of
read or written byte addresses are defined in Chapter 4.3 of [Bau14].
J Definition 5.83
Safe C-IL Step
The predicate safesteppi,θC-IL denotes whether, given a program pi and a context
θ, the next step of thread i in configuration c is ownership safe according the
ownership pair o, o′.
safesteppi,θC-IL(c ∈ CC -IL, c′ ∈ CC -IL, i ∈ Tid, o ∈ O, o′ ∈ O) ∈ B
def
=
safeaccC-IL(i, iostep
pi,θ(c), Rpi,θc (stmtnext(c, pi)),W
pi,θ
c (stmtnext(c, pi)), o)
∧ safetransferC-IL(i, iosteppi,θ(c), o, o′)
Where the predicates
safeaccC-IL(i ∈ Tid, io ∈ B, R ∈ 2B
32
,W ∈ 2B32 , o ∈ O) ∈ B
and
safetransferC-IL(i ∈ Tid, io ∈ B, o ∈ O, o′ ∈ O) ∈ B
are defined equivalent to safeacc and safetransfer from Section 3.2.
J Definition 5.84
Safe C-IL
Sequence
The predicate safeseqpi,θC-IL denotes whether a sequence of steps of thread i is
safe.
safeseqpi,θC-IL(c ∈ CC -IL, c′ ∈ CC -IL, i ∈ Tid, o ∈ O, o′ ∈ O) ∈ B
def
=
ownership-inv(o)
∧ ((c = c′ ∧ o = o′)
∨ (∀c′′. (pi, θ ` c→i c′′) =⇒ ∃o′′. safesteppi,θC-IL(c, c′′, i, o, o′′)
∧ safeseqpi,θC-IL(c′′, c′, i, o′′, o′)))
5.3.5 Compiler Correctness Theorem
The compiler consistency defines the simulation relation between C-IL and MIPSP .
We call a compiler correct if it guarantees that in every consistency point the com-
piler consistency holds between the C-IL and the MIPSP machine. We define the
compiler correctness theorem iteratively for system mode execution fragments of
150 CHAPTER 5. C-IL SEMANTICS
MIPSP extended IP schedule execution sequences. It states simulation between
C-IL and MIPSP executions. Since we are talking about optimizing compilers our
simulation step can consists of several C-IL steps. For us a simulation step is a
sub-sequence of the execution, which is bordered by two subsequent consistency
points
Basically our theorem says that for every hypervisor execution portion of a
reordered MIPSP execution between two consistency points there exists a number
of steps on the C-IL level, which simulate the considered MIPSP steps and lead
the C-IL machine into a consistent state.
Theorem 5.1 (CC -IL Compiler Correctness) The compiler correctness theo-
rem says that, if
• pi ∈ progC-IL is a C-IL program,
• θ ∈ contextC-IL is the context of pi,
• info ∈ infoCC -IL is the static compiler information,
• h0 ∈ CM is the initial configuration of the machine executing the compiled
program,
• h0 β→ h′ is an arbitrary IP-schedule execution of the compiled program,
• hk is some hypervisor consistency point of any processor i in h0 β→ h′,
• hl is the next consistency point after hk,
• c is a C-IL configuration which is a consistency point of thread i,
• consistency holds for all running threads between c and hk,
• AC-IL, AroC-IL, AMIPSP and AroMIPSP are defined as in Section 5.3.4 and
Section 3.2.2 and om is a MIPSP ownership consistent with oc,
then there exists a configuration c′, between which and hl consistency holds for
all running threads. c′ is either equal to c or it is the next consistency point of
thread i and is obtained from c by processing some steps of thread i.
Furthermore, if
• oc and o′c are a valid C-IL ownership pair, such that the execution from c
to c′ is ownership-safe according oc and o′c,
• and om and o′m are a MIPSP ownership pair consistent with oc and o′c,
5.3. COMPILER CORRECTNESS 151
then also the corresponding MIPSP execution from h
k to hl is ownership-safe
according to om and o
′
m.
If the last instruction of the considered MIPSP sub-sequence is a store word
to the APIC, then the statement to be executed in c is a function call to the
external function sendipi . This implies that writes to the APIC originate only
from the sendipi function.
∀h0, β. IPsched(h0, β)
∧ ∀c, oc, om, i, k ∈ N ∧ k < |β|.
hypCPC-IL(h
0, i, β, k)
∧ l = nextCP(h0, β, k)
∧ CPpi,θC-IL(c(i))
∧ (∀j ∈ Pid. runningj(β, k) =⇒ consisC-IL(c, info, hk, j))
∧ consisO(oc, om)
=⇒ ∃c′. pi, θ ` c→∗i c′
∧ CPpi,θC-IL(c′(i))
∧ (∀j ∈ Pid. runningj(β, l) =⇒ consisC-IL(c′, info, hl, j))
∧ (∀o′c, o′m. consisO(o′c, o′m) ∧ safeseqpi,θC-IL(c(i), c′(i), i, oc, o′c)
=⇒ safeseqMIPSP (hk, β[k : l − 1], om, o′m))
∧ (storeAapic (hl−1.corei, I(hl−1.corei, hl−1.m))
=⇒ stmtnext(c(i), pi) = call sendipi())
Proof The proof of this great theorem is not in the scope of this thesis. Similar
simulatino theorems have been proven previously for C (or languages close to C).
In [LP08,Lei08] is presented a compiler correctness theorem for a non-optimizing
compiler. The proof of a compiler correctness theorem for an optimizing compiler
is presented in [Ler09].
In [Bau14] a sequential simulation theorem has been applied in the proof of a
general simulation theorem between two concurrent systems including the transfer
of ownership-safety.

Chapter 6
CC-IL+IPI Semantics
In the previous chapter we presented the CC -IL semantics in which define the
execution of concurrent C-IL code. A hypervisor implemented in C-IL interacts
with the underlying hardware. Its execution is influenced by hardware features
like IPIs. The hypervisor uses IPIs and relies on their semantics. In hypervisors
we send an IPI from C-IL, which is then broadcasted on the MIPSP level, and
causes the execution of a given C-IL IPI handler as part of the IPI service routine.
It appears that programmers have in their mind semantics which combine CC -IL
and (parts of) MIPSP . In this chapter we define such a system semantics which
lift the IPI semantics to the program language level. We call this new semantics
CC -IL+IPI .
In order to figure out the kind of IPI component required for our CC -IL
extension we consider the following abstract modeling of the IPI mechanism. A
thread i executes the C-IL external function sendipi. It is implemented by a write
to the APIC interrupt command register, which sets the DS bit of the processor’s
APIC. The MIPSP IPI transition broadcasts the IPI, clears the DS bit and sets
the corresponding interrupt request bit in the APICs of the targets. The targets
are interrupted and start executing the C-IL IPI handler, which is indicated by
clearing the corresponding interrupt request bit and setting the corresponding
interrupt in-service bit in the APIC. When the handler terminates the in-service
bit is cleared and the interrupted thread continues from the point, where it was
interrupted.
We define CC -IL+IPI in a way, which allows to talk about the complete
scenario from above. CC -IL+IPI and CC -IL differ in the configuration and the
operational semantics.
J Definition 6.1
Sequential
C -IL+IPI
Configuration
We extend the C-IL configuration with a hardware related component apic ∈
C apicC -IL, which contains three Boolean flags.
CC -IL+IPI
def
= [M∈ Bgm → B8, s ∈ frame∗C-IL, apic ∈ C apicC -IL].
153
154 CHAPTER 6. CC-IL+IPI SEMANTICS
C apicC -IL
def
= [DS ∈ B, IRR ∈ B, ISR ∈ B]
DS denotes a send-IPI request. IRR denotes a pending IPI request. ISR
denotes that the thread executes the C-IL IPI handler.
Definition 6.2 I
Concurrent
C -IL+IPI
Configuration
A concurrent C -IL+IPI configuration is defined by the record CCC -IL+IPI .
CCC -IL+IPI
def
= [M∈ Bgm → B8, s ∈ Tid→ frame∗C-IL, apic ∈ C apicCC -IL]
where the CC -IL+IPI APIC component contains one C -IL+IPI APIC per
thread.
C apicCC -IL
def
= Tid→ C apicC -IL
From a concurrent configuration c ∈ CCC -IL+IPI we can construct the se-
quential C -IL+IPI configuration c(t) ∈ CC -IL+IPI of given thread t ∈ Tid.
c(t ∈ Tid) ∈ CC -IL+IPI def= [c.M, c.s(t), c.apic(t)]
In later statements and proofs we will need to convert a given concurrent
CC -IL+IPI configuration to CC -IL configuration and vice versa. Therefore we
define the following two functions.
Definition 6.3 I
Converting
CC -IL+IPI
The function chw2cil converts a given CC -IL+IPI configuration to CC -IL
configuration. The function cil2chw unites a given CC -IL configuration and a
C-IL APIC to a CC -IL+IPI configuration.
chw2cil(c ∈ CCC -IL+IPI ) ∈ CCC -IL def= [c.M, c.s]
cil2chw(c ∈ CCC -IL, apic ∈ C apicCC -IL) ∈ CCC -IL+IPI
def
= [c.M, c.s, apic]
6.1 CC-IL+IPI operational semantics
As previously mentioned, we intend to mirror steps of the hardware in our seman-
tics. Thus CC -IL+IPI operational semantics consists of
• steps of the threads that correspond to the C-IL program, which we denote
by
cil−→ and
• two additional hardware steps for the invocation of the IPI handler (which
we call Jump to IPI Service Routine, or JIPISR) and for the broadcasting
of IPIs, which we denote by
jipisr−−−→ and ipi−→ respectively.
6.1. CC-IL+IPI OPERATIONAL SEMANTICS 155
We define
cil−→ in Section 6.1.1, jipisr−−−→ in Section 6.1.2 and ipi−→ in Section
6.1.3. The local steps
cil−→ and jipisr−−−→ are executed on a sequential C -IL+IPI
configuration of a given thread and
ipi−→ is a global step executed on the whole
CC -IL+IPI configuration.
J Definition 6.4
CC-IL + IPI step
We define the CC -IL+IPI semantics top-down. A step in the CC -IL+IPI
semantics is either a C-IL step of some thread, or a local JIPISR step, or an IPI
step.
pi, θ ` c cil−→tc′ ∨ pi, θ ` c jipisr−−−→tc′ ∨ pi, θ ` c ipi−→c′
pi, θ ` c→ c′
The execution of local
cil−→ and jipisr−−−→ steps in the concurrent context is defined
by the following two rules.
pi, θ ` c(t) cil−→(M′, s′, apic′)
c′ = (M′, c.s[t 7→ s′], c.apic[t 7→ apic′])
pi, θ ` c cil−→tc′
pi, θ ` c(t) jipisr−−−→(M′, s′, apic′)
c′ = (c.M, c.s[t 7→ s′], c.apic[t 7→ apic′])
pi, θ ` c jipisr−−−→tc′
We introduce the following shorthands for CC -IL+IPI steps.
c
cil(t)−−−→
pi,θ
c′ def= pi, θ ` c cil−→tc′
c
jipisr(t)−−−−−→
pi,θ
c′ def= pi, θ ` c jipisr−−−→tc′
c
ipi−−→
pi,θ
c′ def= pi, θ ` c ipi−→c′
A sequence of steps we denote by
c
β−−→
pi,θ
c′,
where
∀i ∈ [0 : |β| − 1]. βi = ipi ∨ ∃t ∈ Tid. βi = cil(t) ∨ βi = jipisr(t).
J Definition 6.5
Hardware step
We define a predicate to denote for a given step c
α−−→
pi,θ
c′ whether it is a local
JIPISR step, or an IPI step.
hw -step(α)
def
= α ∈ {jipisr(t), ipi}
156 CHAPTER 6. CC-IL+IPI SEMANTICS
6.1.1 C-IL Steps
C-IL steps in the CC -IL+IPI semantics are mostly the same as the steps in the
C-IL semantics as presented in 5.1.10.
If the next statement is an assignment, goto, if-not-goto, function call, return
with result
stmtnext(c, pi) ∈ {e0 = e1, goto l, ifnot e goto l,
call e(E), e0 = call e(E), return e
we only strengthen the condition on the transition.
In the above cases we require that either there is no interrupt request or the
thread is executing the service routine.
(c.apic.IRR = 0) ∨ (c.apic.ISR = 1)
We shown as an example the rule for executing an assignment statement in
CC -IL+IPI .
stmtnext(c, pi) = (e0 = e1) ((c.apic.IRR = 0) ∨ (c.apic.ISR = 1))
pi, θ ` c→ incloc(writeθ(c, [&e0]pi,θc , [e1]pi,θc ))
The execution of the return statement without return value is different in C-IL
and C -IL+IPI .
Definition 6.6 I
C -IL+IPI
Function Return
without Result
We distinguish between three types of return statements without return value:
• return statements executed by the program thread,
• return statements of handler functions called by the ipihandler ,
• the return statement of the ipihandler .
For return statements executed by the program thread and return statements
of handler functions called by the ipihandler we drop the frame as usually.
stmtnext(c) = return c.rdstop = ⊥
(¬((c.apic.IRR = 1 ∧ c.apic.ISR = 0) ∨ (c.apic.ISR = 1 ∧ c.ftop = ipihandler)))
pi, θ ` c→ dropframe(c)
The return statement of the ipihandler ends the servicing of an IPI and we
need to extend the transition. In that case additionally to dropping the frame we
clear the ISR bit.
stmtnext(c) = return c.rdstop = ⊥ c.apic.ISR = 1 c.ftop = ipihandler
pi, θ ` c→ dropframe(c)[apic.ISR 7→ 0]
6.1. CC-IL+IPI OPERATIONAL SEMANTICS 157
The function call to the external sendipi function changes the MIPSP APIC.
In order to keep the C-IL APIC consistent with MIPSP we redefine in our extended
semantic the sendipi transition relation defined by θ.Rextern.
J Definition 6.7
sendipi Transition
Relation
In addition to the functionality from Definition 5.46 the write to the APIC
ICR on the C -IL+IPI level sets the DS flag.
(⊥, c, c′) ∈ θ.Rextern(sendipi) =⇒ c′ = incloc(writeθ(c, ICRa , data))[DS 7→ 1]
6.1.2 JIPISR Step
The JIPISR step in our C -IL+IPI semantics defines the invocation of the inter-
rupt handler. It is processed if there is a pending IPI request. It changes only the
stack and the APIC of a single thread.
J Definition 6.8
IPI Handler Call
The execution of the C-IL program is interrupted by pushing the frame of the
IPI handler on the stack. The effect is similar to a (non-existing) function call to
the IPI handler. Furthermore the IRR and the ISR flags are flipped.
c.apic.IRR = 1 c.apic.ISR = 0
pi, θ ` c→ frameipi(c)[apic.IRR 7→ 0, apic.ISR 7→ 1]
The function frameipi creates a new frame for the IPI handler and pushes it
on the top of the stack.
frameipi(c ∈ CC -IL) ∈ CC -IL def= c[s := (M′E ,⊥, ipihandler, 0) ◦ c.s]
where M′E has to offer enough space for the handler’s local variables
V ′ = pi.F(ipihandler).V.
The ipihandler has no parameters,
pi.F(ipihandler).npar = 0
thus V ′ contains only local variables. According to that fact the condition on the
size of M′E is the following:
∀i ∈ [0 : |V ′| − 1]. V ′[i] = (vi, ti) =⇒ |M′E(vi)| = sizeθ(ti).
6.1.3 IPI Step
All the steps defined up to here are local and only change the apic of a single
thread. The CC -IL+IPI IPI step is the only step where we do access in one
158 CHAPTER 6. CC-IL+IPI SEMANTICS
step the apic components of all threads. It implements the broadcasting of an
interrupt.
Definition 6.9 I
Broadcast IPI
Step
If the DS flag of some thread indicates send-ipi request, the IPI is broadcasted
to all other threads.
c.apic(t).DS = 1
pi, θ ` c→ c[apic 7→ sendipi(c.apic, t)]
The function sendipi defines the changes of the configurations of the all
threads. It clears the DS flag of the sending thread and sets the IRR flags
of all other threads.
sendipi(apic ∈ C apicCC -IL, t ∈ Tid) ∈ C apicCC -IL
sendipi(apic, t)(t
′) def=
{
(0, apic(t ′).IRR, apic(t ′).ISR) if t′ = t
(apic(t ′).DS , 1, apic(t ′).ISR) otherwise
6.2 CC-IL+IPI Safety
In the previous chapter we have defined ownership safety for C-IL program thread
steps (Definition 5.83) and for C-IL steps sequences of a given thread (Defini-
tion 5.84). In the ownership safety of CC -IL+IPI executions we additionally
consider the new hardware steps and also distinguish between C-IL steps of a
program thread and C-IL steps of an interrupt thread. Furthermore we extend the
ownership to the concurrent context where an execution sequence contains steps
of different threads. We want to use the OXT ownership from MIPSP level on the
CC -IL+IPI level. Since in OXT the stack addresses are visible, we actually need
another ownership type for CC -IL+IPI . Our workaround for this is to define an
empty stack addresses set on the CC -IL+IPI level.
The static part of the ownership, i.e. the address space and the read only
memory, is defined by AC-IL and AroC-IL.
The dynamic part of OXT contains program thread addresses, handler thread
addresses and stack addresses. Thus we need to redefine the the ownership con-
sistency for our extended simulation.
Definition 6.10 I
Ownership
Consistency XT
Given CC -IL+IPI ownership oc and MIPSP ownership om the predicate
consisOXT states that:
• oc and om satisfy the ownership invariant,
• the sets of shared addresses for C-IL and MIPSP are equal,
• the set of program thread addresses for processor i is equal to the set of
addresses owned by the C-IL program thread i,
6.2. CC-IL+IPI SAFETY 159
• the set of handler thread addresses for processor i is equal to the set of
addresses owned by the C-IL handler thread i plus the corresponding IPCB
regions,
• the set of stack addresses for processor i is equal to the set of addresses in
the stack region of the C-IL thread i,
• and the set of stack addresses for CC -IL+IPI thread i is empty.
consisOXT (oc ∈ OXT , om ∈ OXT ) ∈ B def=
∧ ownership-invXT (oc)
∧ ownership-invXT (om)
∧ oc.Ash = om.Ash
∧ ∀i ∈ Tid. (om.Opi = oc.Opi )
∧ ∀i ∈ Tid. (om.Ohi = oc.Ohi ∪ Aipcb i)
∧ ∀i ∈ Tid. (om.Osi = StR(i, info))
∧ ∀i ∈ Tid. (oc.Osi = ∅)
J Definition 6.11
Safe CC -IL+IPI
Step
The predicate safesteppi,θC -IL+IPI denotes whether for a given program pi and a
context θ the next step of thread i in configuration c is ownership safe according
the ownership pair o, o′.
safesteppi,θC -IL+IPI (c ∈ CC -IL+IPI , c′ ∈ CC -IL+IPI , i ∈ Tid, ih ∈ B,
o ∈ OXT , o′ ∈ OXT ) ∈ B def=
safeaccC -IL+IPI (i, iostep
pi,θ(c), ih,Rpi,θc (stmtnext(c, pi)),W
pi,θ
c (stmtnext(c, pi)), o)
∧ safetransferC -IL+IPI (i, iosteppi,θ(c), ih, o, o′)
where
safetransferC -IL+IPI (i ∈ Tid, io ∈ B, ih ∈ B, o ∈OXT , o′ ∈ OXT ) ∈ B def=
safetransferXT (i, io, ih, o, o
′)
160 CHAPTER 6. CC-IL+IPI SEMANTICS
and the predicate
safeaccC -IL+IPI (i ∈ Tid, io ∈ B, ih ∈ B, R ∈ 2B
32
,W ∈ 2B32 , o ∈ OXT ) ∈ B def=
(R ⊆ o.Ohi ∪ o.Ash ∪ o.Aro) if ih ∧ io
∧(W ⊆ o.Ohi ∪ (o.Ash \ o.O
h
i ))
(R ⊆ o.Ohi ∪ o.Aro) if ih ∧ ¬io
∧(W ⊆ o.Ohi \ o.Ash)
(R ⊆ o.Opi ∪ o.Ash ∪ o.Aro) if ¬ih ∧ io
∧(W ⊆ o.Opi ∪ (o.Ash \ o.O
p
i ))
(R ⊆ o.Opi ∪ o.Aro) if ¬ih ∧ ¬io
∧(W ⊆ o.Opi \ o.Ash)
is defined similarly to safeaccXT (Definition 4.6) from Section 4.1.
The only difference appears in the accesses to local variables, which on the
C -IL+IPI level are safe by default due to the separate frames for the program
thread and the handler thread. Therefore we have to remove the conditions for
the stack addresses in the definition of safeaccC -IL+IPI .
Definition 6.12 I
Safe C -IL+IPI
Sequence
The predicate safeseqpi,θC -IL+IPI denotes whether a sequence of C-IL steps of
thread i in a C -IL+IPI execution is safe.
safeseqpi,θC -IL+IPI (c ∈ CC -IL+IPI , c′ ∈ CC -IL+IPI , i ∈ Tid, o ∈ OXT , o′ ∈ OXT ) ∈ B
def
=
ownership-invXT (o)
∧ ((c = c′ ∧ o = o′)
∨ (∀c′′. (pi, θ ` c cil−→ic′′) =⇒ ∃o′′. safesteppi,θC -IL+IPI (c, c′′, i, o, o′′)
∧ safeseqpi,θC -IL+IPI (c′′, c′, i, o′′, o′)))
In order to be able to talk about a complete C -IL+IPI execution and cover
also the new hardware steps in the semantic we need another definition. We
note that CC -IL+IPI hardware steps are safe by definition and do not transfer
ownership.
Definition 6.13 I
Safe CC -IL+IPI
Sequence
The predicate safeseqpi,θCC -IL+IPI denotes the ownership safety of an execution
6.3. IPI SERVICE ROUTINE 161
h0 JISR Prolog EpilogISR1 ISR2
Compiled Code
MIPSp h
1 h2 h3 h4 h5 h6
ipihandler
Body
Figure 6.1: MIPSP execution sequence with IPI handling.
sequence c
β−−→
pi,θ
c′.
safeseqpi,θCC -IL+IPI (c, β, c
′, o, o′) ∈ B def= ownership-invXT (o) ∧ ownership-invXT (o′)
∧ (β = ε ∧ c = c′
∨ ∀c′′. c β0−−→
pi,θ
c′′ =⇒
(hw -step(β0) =⇒ safeseqpi,θCC -IL+IPI (c′′, tl(β), c′, o, o′)
∧ β0 = cil(t) =⇒ ∃o′′. safesteppi,θC -IL+IPI (c(t), c′′(t), t, c.apic(t).ISR, o, o′′)
∧ safeseqpi,θCC -IL+IPI (c′′, tl(β), c′, o′′, o′)))
J Definition 6.14
Safe Program
A given program pi is ownership safe according to an ownership state o if all
possible executions of it starting in configuration c are safe.
safeprogpi,θCC -IL+IPI (c ∈ CCC -IL+IPI , o ∈ OXT ) ∈ B
def
=
∀c′, β. c β−−→
pi,θ
c′ =⇒ ∃o′. safeseqpi,θCC -IL+IPI (c, β, c′, o, o′)
6.3 IPI Service Routine
We want to prove that a trace of the CC -IL+IPI machine corresponds to a
trace of the MIPSP that includes handling of an IPI. Therefore we need to take
a deeper look at the IPI service routine execution sequences on MIPSP (Figure
6.1).
We relate the execution of the interrupt service routine on MIPSP to the
C-IL program and the compiler and distinguish the following parts in it. The IPI
service routine starts with JISR and continues with the execution of the ISR code.
The ISR in case of an IPI consists of portions written in assembler and a the C-IL
function ipihandler . In Figure 6.2 we sketch the assembler implementation of the
ISR. It begins with a list of instructions ISR1 . The instructions in ISR1 store the
162 CHAPTER 6. CC-IL+IPI SEMANTICS
…
…
...
…
j  ipihandler
…
…
…
eret
ISR1-
0:
4:
ISR2base – 4:
ISR2base:
ISR
ISR2
ISR1
Figure 6.2: ISR implementation consisting of ISR1 and ISR2 . ISR1 starts at
address 0 and ends with a jump instruction. ISR2 starts at address ISR2base and
ends with an eret instruction
complete context in the processor’s IPCB and dispatch the interrupt to the IPI
handler. ISR1 ends with a function call to the C-IL handler. The implementation
of the call decreases the stack pointer by four (allocating space for return address),
stores the program counter (return address) in that newly allocated stack region
by a sw instruction and sets the program counter to the beginning of ipihandler .
Since store word can not directly access the program counter, we need first to
store it in the link register by a jump-and-link instruction and of course add the
corresponding offset to the linked program counter before storing it in the stack.
The last instruction in ISR1 is a jump instruction. We refer to the instructions
before the jump instruction by ISR1−. The label to the C-IL function in the
jump instruction is replaced by the compiler with the address of the first prolog
instruction of the function [Lev99]. After the prolog has set up the stack frame
for the function we have reached a consistency point. The execution continues
with the instructions that implement the body of ipihandler . Then the epilog is
executed and we continue with the instruction after the jump to ipihandler . The
list of ISR instructions, that follow the jump, restore the context from the IPCB.
We refer to this list of instructions by ISR2 . The last instruction in ISR2 is eret,
which concludes the service routine.
The corresponding execution in CC -IL+IPI is depicted in Figure 6.3. In c0 we
have a pending IPI request, which according to the CC -IL+IPI semantics leads
6.3. IPI SERVICE ROUTINE 163
ipihandler
Body
JIPISR returnCC-IL+IPI c0 c1 c2 c3
Figure 6.3: CC -IL+IPI execution sequence with IPI handling.
to the execution of JIPISR step, followed by the execution of ipihandler . An
important state is the configuration before the return statement of the ipihandler .
The execution of this return statement changes also the APIC component.
In Figure 6.4 we depict both executions together. Significant states in the
MIPSP execution, in consideration of compiler consistency, are h
0, h3, h4, and
h6.
Actually in h4 the execution should continue with the return code. We expect
that for a return statement without return value compilers generate an empty list
of instructions. In such cases all required actions for completing the function call
are implemented by the function’s epilog. Therefore we consider in our figures the
return code as part of the epilog for simplicity. Nevertheless in formal definitions
we refer to this point in the execution by a program counter pointing to the first
instruction of the return statement.
info.stmtcode(ipihandler , |pi.F(ipihandler).P| − 1)
In case of an empty return code the compiler would assign the epilog to the return
statement.
info.stmtcode(ipihandler , |pi.F(ipihandler).P| − 1) = info.ecode(ipihandler)
The MIPSP configuration during the ISR depends on the concrete implementation
of the assembler portions and the prolog instructions generated by the compiler.
We do not consider the details of the step by step execution in this work, but
rather give a specification of the ISR. We omit the tedious details of the hypervisor
ISR implementation, i.e. the instruction lists ISR1 and ISR2. A similar example
can be found in [Sha12].
In order to formalize our specification of ISR we need to define the execution
of instruction lists on MIPSP .
J Definition 6.15
Instruction List
Execution
The predicate instrexec denotes the uninterrupted execution of the instruc-
tion list l without branch and jump instructions on the processor i starting in
configuration h. It comprises all required conditions on the configurations and
the execution sequence, e.g. the program counter of the processor points to the
first instruction on the list h.m4(h.pci) = l0 and h
′ is reached after executing
n = |l| (non-interrupted) core steps on processor i.
164 CHAPTER 6. CC-IL+IPI SEMANTICS
consis consis
consis
h0 JISR Prolog EpilogISR1 ISR2
ipihandler
Body
Compiled Code
consis
JIPISR returnCC-IL+IPI
MIPSp h
1 h2 h3 h4 h5 h6
ipihandler
Body
c0 c1 c2 c3
Figure 6.4: Simulation between CC -IL+IPI and MIPSP execution sequence
with IPI handling.
instrexec(h ∈ CM , β ∈ (ΣM )∗, l ∈ (B32)∗, i ∈ Pid , h′ ∈ CM ) ∈ B def=
h.m4·|l|(h.pci) = l
∧ h β→ h′
∧ |β| = |l|
∧ ∀j ∈ [0 : |β| − 1]. core(βj) ∧ βj .pid = i
∧ ∀j ∈ [0 : |β| − 1]. ¬isJISR(hj .corei, eevj , I(hj .corei, hj .M))
∧ ∀j ∈ [1 : |β| − 1]. hj .pci = h.pci + bin32(4 · (j − 1))
where
eevj =
{
eev(hj .apicpid) if exint(h
j .corepid, h
j .apicpid)
0256 otherwise
We also use the notation the following shorthand notation.
h
β−−→
l, i
h′ = instrexec(h, β, l, i, h′)
The following software condition must be satisfied by the concrete hypervisor
implementation in order to apply our approach.
Software Condition 9 (ISR) Given an ISR implementation with structure as in
Figure 6.2 we have to prove that after the execution of ISR1−:
• the program counter points to the jump to ipihandler ,
• the stack pointer is decrease by four,
6.3. IPI SERVICE ROUTINE 165
h.rspi
h'.rbpi = h.rbpi PBP
Low Memory
High Memory
Stack Base
. . .
ISR2baseh'.rspi
Stack Frame
Interrupted
Function
Return 
Address
Program 
Thread 
Region
Handler 
Thread
Region
Figure 6.5: Stack layout in h′ after executing ISR1− in h.
• the base pointer is unchanged,
• the memory location pointed to by the stack pointer stores the address of
the first instruction of ISR2 ,
• the IPCB of the processors stores the initial core state,
• all other components are unchanged.
∀h, β, h′, i. h β−−−−−−→
ISR1−, i
h′ =⇒
h′.pci = ISR2base − bin32(4)
∧ h′.rspi = h.rspi − bin32(4)
∧ h′.rbpi = h.rbpi
∧ h′.m4(h.rspi − bin32(4)) = ISR2base
∧ ipcb(h′, i) = h.corei
∧ ∀a ∈ B32. a /∈ (Aipcb ∪ [h.rspi : h′.rspi]) =⇒ h.m(a) = h′.m(a)
∧ ∀j ∈ Pid . j 6= i =⇒ h.corej = h′.corej
∧ ∀j ∈ Pid . h.apicj = h′.apicj
The stack layout is depicted in Figure 6.5.
166 CHAPTER 6. CC-IL+IPI SEMANTICS
After the execution of ISR2 the machine state has to satisfy the following
conditions:
• the program counter points to the interrupted instruction,
• the core state is restored from the IPCB,
• the IPI in-service bit in the APIC is cleared,
• all other components are unchanged.
∀h, β, h′, i. h β−−−−−→
ISR2 , i
h′ =⇒
h′.pci = ipcb(h, i).epc
∧ h′.rbpi = h.rbpi
∧ h′.rspi = h.rspi
∧ h′.corei = ipcb(h, i)
∧ h′.apici = h.apici[isr[0] 7→ 0]
∧ ∀a ∈ B32. h′.m(a) = h.m(a)
∧ ∀j ∈ Pid . j 6= i =⇒ h′.corej = h.corej
∧ ∀j ∈ Pid . j 6= i =⇒ h′.apicj = h.apicj
Since we heavily rely on the values stored in the IPCB we state the following
software condition.
Software Condition 10 (IPCB) All generated instructions which are executed
during IPI handling, i.e. the code for ipihandler and all its callee functions, do
not change the processor’s IPCB.
In our ownership setting, i.e. excluding the IPCB from the CC -IL+IPI ad-
dress space, and for a correct compiler this condition is guaranteed for ownership-
safe programs.
In the following two software conditions we state the validity of the prolog and
epilog code, that have to be met by the compiler. We constrain our predicates
only to the state components relevant for our simulation relation. A complete
prolog / epilog specification must at least additionally talk about the saving /
restoring of callee-save registers. Since ISR2 does not depend on callee-save
registers the given conditions are sufficient for our purposes.
Software Condition 11 (Prolog Correctness) The predicate pcodeV denotes
the validity of the prolog of a given function f . It states that if:
• f ∈ Fname is a function name
6.3. IPI SERVICE ROUTINE 167
• info ∈ infoCC -IL is the static compiler information,
• pcodef ∈ (B32)∗ is the prolog code generated for f
pcodef = h.minfo.pcodesize(f)(info.pcode(f))
• h ∈ CM is the initial configuration of the prolog execution,
• h′ ∈ CM is the final configuration of the prolog execution,
then the execution of pcode(f ) ends in a configuration with program counter
pointing to the code of the first statement in f . Furthermore pcode(f ) allocates
stack space for f ’s frame (assuming that the space for parameters and the return
address is already allocated) and adjusts the stack and base pointer. The previous
base pointer is stored in the stack. The prolog does not change anything but the
state of executing processor and the portion of the stack which it allocates.
pcodeV
pi,θ(h ∈ CM , info ∈ infoCC -IL, f ∈ Fname, i ∈ Pid , o ∈ OXT ) ∈ B def=
∀β, h′. h β−−−−−→
pcodef , i
h′ =⇒
h′.pci = info.stmtcode(f, 0)
∧ h′.rbpi = h.rspi − bin32(4)
∧ h′.rspi = h.rspi − bin32(info.fsize(f, 0) + info.fsizep(f) + 4)
∧ h′.m4(h.rspi − bin32(4)) = h.rbpi
∧ ∀a ∈ B32. a /∈ [h.rspi : h′.rspi] =⇒ h′.m(a) = h.m(a)
∧ ∀j ∈ Pid . j 6= i =⇒ h′.corej = h.corej
∧ ∀j ∈ Pid . h′.apicj = h.apicj
Software Condition 12 (Epilog Correctness) The predicate ecodeV denotes the
validity of the epilog of a given function f from program pi. It states that if:
• f ∈ Fname is a function name
• info ∈ infoCC -IL is the static compiler information,
• ecodef ∈ (B32)∗ is the epilog code generated for f
ecodef = h.minfo.ecodesize(f)(info.ecode(f))
• h ∈ CM is the initial configuration of the epilog execution,
• h′ ∈ CM is the final configuration of the epilog execution,
168 CHAPTER 6. CC-IL+IPI SEMANTICS
then the execution of ecodef frees the stack space allocated for f ’s frame and
restores the program counter and the previous base pointer from the frame header.
ecodeV
pi,θ(h ∈ CM , info ∈ infoCC -IL, f ∈ Fname, i ∈ Pid , o ∈ OXT ) ∈ B def=
∀β, h′. h β−−−−−→
ecodef , i
h′ =⇒
h′.pci = h.m4(h.rbpi + bin32(4))
∧ h′.rbpi = h.m4(h.rbpi)
∧ h′.rspi = h.rspi + bin32(info.fsize(f, |pi.F(f).P| − 1))
∧ ∀a ∈ B32. h′.m(a) = h.m(a)
∧ ∀j ∈ Pid . j 6= i =⇒ h′.corej = h.corej
∧ ∀j ∈ Pid . h′.apicj = h.apicj
6.4 CC-IL+IPI Simulation
Based on C-IL compiler consistency we want to prove simulation between CC -IL+IPI
and MIPSP executions, where the extended CC -IL+IPI semantics allow us to
prove simulation for more general MIPSP executions than the compiler correct-
ness for ordinary C-IL from the previous chapter. In addition to the executions
covered by Theorem 5.1 we now consider MIPSP executions including IPIs.
6.4.1 CC-IL+IPI Simulation Relation
Since the IPI handler does not have a caller its stack frame header does not have a
return address to a C-IL location. Thus we have to change the control consistency
predicate and exclude the ipihandler frame from the return address condition.
Definition 6.16 I
C -IL+IPI
Control
Consistency
The predicate consiscontrolC -IL+IPI states that the value of the program counter is
the address of the current C-IL instruction. For any function f of the program
except the first function (main) and ipihandler the value of the return address
in the function’s stack frame is the address of the first post-call-code instruction
after the function call to f .
consiscontrolC -IL+IPI (s ∈ frame∗C-IL, info ∈ infoC-IL, core ∈ CCORE ,m ∈ B32 → B8)
def
=
core.pc = info.stmtcode(s.ftop, s.loctop)
∧ ∀i ∈ [1 : |s| − 1]. s.fi 6= ipihandler =⇒
m4(raC-IL(i, s,m, info)) = info.pccode(s.fi−1, s.loci−1)
With the next software condition we guarantee the proper flow of the program
despite the weakening of the control consistency.
6.4. CC-IL+IPI SIMULATION 169
Software Condition 13 (Calls ipihandler) In a given program pi does not exist
a function call statement to ipihandler .
We also have to change the consistency condition for the local variables. The
change is related to the top-most frame at the time of interrupt rising, i.e. the
frame preceding the ipihandler frame. Since the invocation of ipihandler hap-
pens without function call, there is no pre-call -code and thus caller-save register
are not saved on the stack. Still the interrupted function may have local variables
that are stored in caller-save registers, which are then eventually overwritten by
the ISR.
J Definition 6.17
C -IL+IPI Local
Variables
Consistency
The predicate consisvarC -IL+IPI denotes local variables consistency. It states
that the local variables and parameters are stored properly. The variables that are
passed in registers according to the compiler information are stored in the registers
only for the topmost frame. We make a case distinction in our definition of the
consistency relation for non-topmost frames. If the subsequent frame belongs to
the ipihandler , then we compare the values of the local variables in caller-save
registers to the registers in the IPCB1. For the other frames they are on the stack
either in the callee-save or in the caller-save regions of the corresponding frame.
consisvarC -IL+IPI (s ∈ frame∗C-IL, pi ∈ progC-IL, info ∈ infoC-IL, core ∈ CCORE ,
m ∈ B32 → B8) ∈ B def=
∀i, j ∈ N. i < |s| ∧ j < info.lvarnum(s.fi) =⇒ s.Mi(vj) =
core.gpr(reg) if reg 6= ⊥ ∧ i = top(s)
m4(bai − bin32(info.regoff (f, reg))) if reg 6= ⊥ ∧ i < top(s)
∧s.fi+1 6= ipihandler
m4(ipcbba(i) + bin32(reg · 4)) if reg 6= ⊥ ∧ i < top(s)
∧s.fi+1 = ipihandler
msizeθ(qt2t(tj))(bai + bin32(4 + info.paroff (vj , s.fi))) if reg = ⊥ ∧ j < npar
msizeθ(qt2t(tj))(bai − bin32(info.lvaroff (vj , s.fi))) if reg = ⊥ ∧ j ≥ npar
where
(vj , tj) = (pi.F(s.fi)).V[j]
reg = info.lvarreg(vj , s.fi, s.loci)
npar = (pi.F(s.fi)).npar
bai = baC-IL(i, s, info)
1We have already defined IPCB in Section 4.2.
170 CHAPTER 6. CC-IL+IPI SEMANTICS
We integrate the required changes of control and local variables consistency
in the definitions of stack consistency, local consistency and compiler consistency
in the next three definitions.
Definition 6.18 I
C -IL+IPI Stack
Consistency
The predicate consisstackC -IL+IPI denotes stack consistency as a wrapper for
register, previous base pointer and local variables consistency.
consisstackC -IL+IPI (s ∈ frame∗C-IL, info ∈ infoC-IL,
core ∈ CCORE ,m ∈ B32 → B8) ∈ B def=
consisregsC-IL(s, info, core)
∧ consispbpC-IL(s, info,m)
∧ consisvarC -IL+IPI (s, info, core,m)
Definition 6.19 I
C -IL+IPI Local
Consistency
The predicate consis localC -IL+IPI denotes local compiler consistency.
consis localC -IL+IPI (s ∈ frame∗C-IL, info ∈ infoC-IL,
core ∈ CCORE ,m ∈ B32 → B8) ∈ B def=
consiscontrolC -IL+IPI (s, info, core,m)
∧ consisstackC -IL+IPI (s, info, core,m)
Definition 6.20 I
C -IL+IPI
Compiler
Consistency
The predicate consisCC -IL+IPI denotes compiler consistency.
consisCC -IL+IPI (c ∈ CCC -IL+IPI , info ∈ infoCC -IL, h ∈ CM , i ∈ Pid) ∈ B def=
consisglobalCC -IL(chw2cil(c), info, h)
∧ consis localC -IL+IPI ((chw2cil(c))(i).s, info(i), h.corei, h.m)
C -IL+IPI compiler consistency exposes some crucial differences between C-
IL and C -IL+IPI . Obviously in the presence of an ipihandler frame on the
stack C -IL+IPI consistency does not imply C-IL consistency and vice versa. In
the context of a simulation proof between C -IL+IPI and MIPSP it is natural
to want to make use of the C-IL compiler correctness theorem for the portions
in a C -IL+IPI execution consisting only of C-IL steps. With the C -IL+IPI
extensions of compiler consistency we can not do this. Instead we need a theorem
that similarly to Theorem 5.1 talks about simulation blocks of steps of a single
C -IL+IPI thread between two consecutive consistency points. In contrast to the
C-IL compiler correctness theorem, we now also want to talk about the execution
of interrupt handlers, and maintain the extended ownership safety. Though the
special cases of starting and ending an interrupt thread execution remain excluded
6.4. CC-IL+IPI SIMULATION 171
from the theorem. This limitation of the theorem is expressed by the absence of
JISR and eret instruction in the MIPSP execution. Thus the code neither pushes
nor pops an ipihandler frame on/from the stack.
Theorem 6.1 (CC -IL+IPI Compiler Correctness) The compiler correctness
theorem for C -IL+IPI theorem says that, if
• pi ∈ progC-IL is a C-IL program,
• θ ∈ contextC-IL is the context of pi,
• info ∈ infoCC -IL is the static compiler information,
• h0 ∈ CM is the initial configuration of the machine executing the compiled
program,
• h0 β→ h′ is an arbitrary IP-schedule execution of the compiled program,
• hk is some hypervisor consistency point of any processor i in h0 β→ h′,
• hl is the next consistency point after hk,
• between hk and hl the MIPSP neither an eret instruction nor a JISR step
are executed,
• cˆ is a CC -IL configuration which is a consistency point of thread i,
• apic is an C apicCC -IL configuration,
• c is a CC -IL+IPI configuration build of cˆ and apic,
• CC -IL+IPI consistency holds for all running threads between c and hk,
then there exists a configuration c′, between which and hl consistency holds for
all running threads. c′ is either equal to c or it is the next consistency point of
thread i and is obtained from c by processing some C-IL steps of thread i.
Furthermore, for AC-IL, AroC-IL, AMIPSP and AroMIPSP defined as in Section
5.3.4 and Section 3.2.2 if
• oc and o′c are a valid CC -IL+IPI ownership pair, such that the execution
from c to c′ is ownership-safe according oc and o′c,
• and om and o′m are a MIPSP ownership pair consistent with oc and o′c,
then also the corresponding MIPSP execution from h
k to hl is ownership-safe
according to om and o
′
m.
If the last instruction of the considered MIPSP sub-sequence is a store word
to the APIC, then the statement to be executed in c is a function call to the
external function sendipi .
172 CHAPTER 6. CC-IL+IPI SEMANTICS
Formally we state the theorem as follows:
∀h0, β. IPsched(h0, β)
∧ ∀cˆ, apic, oc, om, i, k ∈ N ∧ k < |β|.
hypCPC -IL+IPI (h
0, i, β, k)
∧ l = nextCP(h0, β, k)
∧ consisOXT (oc, om)
∧ CPpi,θC-IL(cˆ(i))
∧ c = cil2chw(cˆ, apic)
∧ (∀j ∈ Pid. runningj(β, k) =⇒ consisCC -IL+IPI (c, info, hk, j))
∧ (∀j ∈ [k : l − 1]. ¬(jisr IPI(βj) ∨ iint(h j .corei , I(h j .corei , hj .M))
∨ eret(I(h j .corei , hj .M)))
=⇒ ∃c′. pi, θ ` c→∗i c′
=⇒ (CPpi,θC-IL(cˆ′(i))
∧ (∀j ∈ Pid. runningj(β, l) =⇒ consisCC -IL+IPI (c′, info, hl, j))
∧ (∀o′c, o′m. consisOXT (o′c, o′m) ∧ safeseqpi,θC -IL+IPI (c(i), c′(i), i, oc, o′c)
=⇒ safeseqXT (hk, β[k : l − 1], om, o′m))
∧ (storeAapic (hl−1.corei, I(hl−1.corei, hl−1.m))
=⇒ stmtnext(apic(i), pi) = call sendipi()))
Proof The proof of this theorem is identical to the proof of Theorem 5.1 and
similarly it does not fall into the scope of this thesis. We note, that we do not in-
troduce any additional gap in our theory by skipping the proof of the C -IL+IPI
compiler correctness theorem since it is based on the same arguments as C-IL
compiler correctness theorem. It has the same inductive structure with an ad-
ditional case distinction on the execution context, i.e. execution of an interrupt
thread or of a program thread. The condition about the absence of JISR and eret
steps implies that the number of existing ipihandler frames on the stack does
not change in the induction step.
∀n. c(i).fn = ipihandler ⇐⇒ c′(i).fn = ipihandler
Thus we can deduce consistency from the induction hypothesis.
Considering the transfer of the ownership-safety from the C -IL+IPI to the
MIPSP level we again use the same argumentation as in Theorem 5.1. Given
that the C-IL program obeys the stronger ownership rules we do not need any new
type of proof techniques to claim that the compiled code is also ownership-safe
according to the extended ownership. We just consider the ownership transfer
separately for the execution of the handler and of the program and talk about
subsets of the joint ownership address space.
6.4. CC-IL+IPI SIMULATION 173
4
In addition to the consistency conditions in our simulation relation we have to
consider the consistency of the apic components.
J Definition 6.21
HW Consistency
The predicate hw -consis denotes that the Boolean flags in the apic component
of a given thread i are consistent with the corresponding bits in the APIC of
processor i, which are related to software IPIs.
hw -consis(c ∈ CCC -IL+IPI , h ∈ CM , i ∈ Tid) ∈ B def= h.irri[0] = c.apic[i].IRR
∧ h.isri[0] = c.apic[i].ISR
∧ h.icri.DS = c.apic[i].DS
J Definition 6.22
CC -IL+IPI
Simulation
Relation
The predicate SRMIPSPCC -IL+IPI denotes the simulation relation between CC -IL+IPI
and MIPSP .
SRMIPSPCC -IL+IPI (c ∈ CCC -IL+IPI , info ∈ infoCC -IL, h ∈ CM , i ∈ Pid) ∈ B
def
=
consisglobalCC -IL(chw2cil(c), info, h)
∧ consis localC -IL+IPI ((chw2cil(c))(i).s, info(i), h.corei, h.m)
∧ hw -consis(c, h, i)
6.4.2 Consistency Points
Executions with IPIs contain more consistency points than the ones defined in
Section 5.3.1. Therefore we extend the old definitions. We define consistency
points based on the set of MIPSP interleaving points. As part of the simula-
tion proof, that we present in the next section, we prove that the CC -IL+IPI
simulation relation holds in the state which we define to be a consistency points.
We add to the set of hypervisor consistency points the state after the execution
of an eret instruction at the end of IPI service routine.
J Definition 6.23
Hypervisor
Consistency Point
of Processor i
The predicate hypCPC -IL+IPI denotes whether the k-th step in the execution
h
β→ h′ is a hypervisor consistency point for processor i.
hypCPC -IL+IPI (h ∈ CM , i ∈ Pid , β ∈ (ΣM )∗, k ∈ [0 : |β| − 1]) ∈ B def=
i = βk.pid ∧ core(βk) ∧ hk.pcβk.pid ∈ Acp
∨ eretIPI (hk−1, βk−1)
174 CHAPTER 6. CC-IL+IPI SEMANTICS
We overload the definition of MIPSP consistency points corresponding to the
new definition of hypervisor consistency points.
Definition 6.24 I
MIPSP
Consistency Point
The predicate CPMIPSP denotes whether the k-th step in the execution h
β→ h′
is a consistency point.
CPMIPSP (h ∈ CM , β ∈ (ΣM )∗, k ∈ [0 : |β| − 1]) ∈ B
def
=
∃i ∈ Pid . hypCPC -IL+IPI (h, i, β, k) ∨ k = min{j | i = βj .pid}
∨ guestIP(h, β, k) ∨ ipi(βk−1)
In the set of the software consistency points we do not need changes. We
would probably expect the initial states of IPI steps, i.e. states in which exists
a set DS flag, to be consistency points. According the CC -IL+IPI operational
semantics, such configurations follow steps executing the function sendipi . As all
function calls, also the ones to sendipi start and end in consistent states. Thus
we do not have additional conditions here.
6.4.3 CC-IL+IPI Simulation Theorem
The simulation theorem that we define and prove in this section differs from
Theorem 5.1 in several aspects.
First of all, as we have already mentioned on numerous occasions, we cover
MIPSP executions with IPI handling. We have to consider, that in CC -IL+IPI
IPIs happen on the border between statements. This is definitely not true in the
general case for MIPSP executions. Moreover, it is not true also for IP schedule
MIPSP executions in general. Thus proving a simulation between CC -IL+IPI
executions and MIPSP executions is feasible only for those MIPSP executions,
that satisfies the schedXT predicate.
Second we also consider guest steps. By doing that we prove simulation of
complete execution sequences and not only of their hypervisor fragments. Since
guest steps in MIPSP are quite restricted this extension is trivial here. For a
MIPSP that is closer to X86 concerning steps in guest mode one would have
to consider on the one hand guest instruction execution and context switch and
on the other hand MMU steps. We already have sketched how the consistency
relation has to be adapted and what are the conditions on the context switch and
the guest execution. MMU steps have been examined in [Kov13], where in an
approach similar to ours the C-IL semantics has been extended and a simulation
theorem has been proven. We are convinced that our theory is applicable on a
full-size MIPS and can easily be combined with the approach from [Kov13].
According to this simulation we want to transfer ownership safety proven on
the CC -IL+IPI level to extended IP schedule MIPSP executions. Based on the
Simulation Theorem 4.12, we can then conclude ownership safety for IP schedule
MIPSP executions. For that purpose the theorem’s claim have to be extended by
a conjunct saying that the original IP schedule executions is safe. The proof of
6.4. CC-IL+IPI SIMULATION 175
this extension is trivial, since we only reorder local program thread steps, and the
ownership of the program thread does not change within the ISR blocks. Thus, if
a given local program thread step is ownership safe at the end of the ISR block,
then it is also ownership safe at the beginning of the ISR block. Having safety
of IP schedule executions we can use the order reduction theorem form [Bau14]
(Definition 3.30) to transfer the ownership safety to arbitrary interleaved MIPSP
executions.
Before stating the simulation theorem we need to define formally some addi-
tional conditions.
In the initial state of both machines (CC -IL+IPI and MIPSP ) the APIC is
empty.
J Definition 6.25
Empty
CC -IL+IPI
APIC
The predicate emptyapicCC -IL+IPI denotes whether a given CC -IL+IPI
APIC configuration is in an initial ”clear” state for all threads.
emptyapicCC -IL+IPI (a ∈ C apicCC -IL) ∈ B
def
= ∀t ∈ Tid. a(t) = (0, 0, 0)
J Definition 6.26
Empty MIPSP
APIC
The predicate emptyapicMIPSP denotes whether in a given MIPSP configu-
ration the APICs of all processors are in an initial ”clear” state.
emptyapicMIPSP (h ∈ CM ) ∈ B
def
= ∀i ∈ Pid. ¬irr(h.apici)
∧ ¬isr(h.apici)
∧ ¬h.DSi
In the proof of the simulation theorem we will rely on properties of the
CC -IL+IPI and MIPSP execution as depicted in Figure 6.4. We capture the
important properties of these executions in the following two lemmas.
Lemma 6.2 (Pre-handler Consistency) The predicate preIPIH V states that
if:
• h (h0 from Figure 6.4) is a MIPSP configuration with pending IPI,
• h′ (h1 from Figure 6.4) is the configuration that we reach when we execute
in h JISR,
• h′′ (h3 from Figure 6.4) is the configuration that we reach when we execute
in h′ the assembler code from the ISR for saving the context and a call to
C-IL ipihandler, and the prolog generated by the compiler for ipihandler,
• c (c0 from Figure 6.4) is a CC -IL+IPI configuration, such that consistency
holds between c and h,
• c′ (c1 from Figure 6.4) is the configuration that we reach after executing
the JIPISR step in c,
176 CHAPTER 6. CC-IL+IPI SEMANTICS
consis
consis
h JISR preIPI
β 
JIPISRCC-IL+IPI
MIPSp h' h''
c c'
α  
Figure 6.6: If c and h are consistent and we execute IPIJISR step in c and JISR,
save context and the compiler generated prolog in h, then the new configurations
c′ and h′′ are also consistent.
• o is a valid ownership,
then consistency will hold again between h′ and c′ and all executed ISA steps are
ownership-safe (Figure 6.6).
preIPIH V (h ∈ CM , c ∈CCC -IL+IPI , info ∈ infoCC -IL, i ∈ Pid , o ∈ OXT ) ∈ B def=
∀α, β, h′, h′′. h α→ h′
∧ jisr IPI(α, i)
∧ h′ β−−−−−→
preIPI , i
h′′
∧ SRMIPSPCC -IL+IPI (c, info, h, i)
∧ c′ = frameipi(c)[apic.IRR 7→ 0, apic.ISR 7→ 1]
∧ ownership-invXT (o)
=⇒ SRMIPSPCC -IL+IPI (c′, info, h′′, i)
∧ safeseqXT (h, αβ, o, o)
where
preIPI = ISR1 , pcode ipihandler
6.4. CC-IL+IPI SIMULATION 177
Proof To prove the consistency we apply the specification of δjisr, Software
Condition 9 and Software Condition 11. For the safety we rely on the following
conditions:
• JISR is safe by definition.
• The ISR code accesses only Aipcb and the stack memory.
• The prolog code accesses only the stack memory.
• All accesses to Aipcb and the stack memory are safe because they are owned
due to ownership invariant.
• There is no ownership transfer in place. All executed steps access only
memory addresses in the IPCB and in the handler stack region (i.e. below
h.rsp).
4
Lemma 6.3 (Post-handler Consistency) The predicate postIPIH V states that
if:
• h is a MIPSP configuration with program counter pointing to the first
instruction of ipihandler ’s return statement (h4 from Figure 6.4),
• h′ is the configuration that we reach when we execute in h the epilog
generated by the compiler for ipihandler and the assembler code from the
ISR for restoring the context (h6 from Figure 6.4),
• c is a CC -IL+IPI configuration, such that consistency holds between c
and h (c2 from Figure 6.4),
• c′ is the configuration that we reach after executing the return step in c (c3
from Figure 6.4),
• o is a valid ownership,
then consistency will hold again between h′ and c′ and all executed ISA steps are
ownership-safe (Figure 6.7).
178 CHAPTER 6. CC-IL+IPI SEMANTICS
consis
postIPI
β 
consis
returnCC-IL+IPI
MIPSp h h'
c c'
Figure 6.7: If c and h are consistent and we exit the ISR by executing the compiler
generated epilog and restore context in h, and the ipihandler return statement
step in c, then the new configurations c′ and h′ are also consistent.
postIPIH V (h ∈ CM , c ∈CCC -IL+IPI , info ∈ infoCC -IL, i ∈ Pid , o ∈ OXT ) ∈ B def=
∀β, h′. h β−−−−−−→
postIPI , i
h′
∧ ownership-invXT (o)
∧ SRMIPSPCC -IL+IPI (c, info, h, i)
∧ c′ = dropframe(c)[apic.ISR 7→ 0]
=⇒ SRMIPSPCC -IL+IPI (c′, info, h′, i)
∧ safeseqXT (h, β, o, o)
where
postIPI = ecode ipihandler , ISR2
Proof To prove this lemma we apply the specification of δeret, Software Condition
12 and Software Condition 9 and argue on the safety similarly to the previous proof.
6.4. CC-IL+IPI SIMULATION 179
consis consis
consis
h0
β 
CC-IL+IPI
MIPSp hk hl hn
cs(k) c
s(l)
CC-IL ĉ0
c0
consis
Figure 6.8: Simulation of a MIPSP execution sequence by a CC -IL+IPI com-
putation.
4
Theorem 6.4 (Simulation Theorem CC -IL+IPI ) The simulation theorem says
that, if
• pi ∈ progC-IL is a C-IL program,
• θ ∈ contextC-IL is the context of pi,
• info ∈ infoCC -IL is the static compiler information,
• h0 ∈ CM is the initial configuration of the machine executing the compiled
program,
• h0 β→ hn is an arbitrary extended IP-schedule execution of the compiled
program,
• in h0 all cores are in system mode,
• in h0 all APICs are clear,
• apic0 is an empty CC -IL+IPI APIC record,
• cˆ0 ∈ CCC -IL is the initial CC -IL configuration of the program execution,
• all threads are at a consistency point in cˆ0,
180 CHAPTER 6. CC-IL+IPI SEMANTICS
• c0 ∈ CCC -IL+IPI is the initial CC -IL+IPI configuration in which cˆ0 is
extended by apic0,
• the simulation relation holds for all running threads between h0 and c0,
• pi ∈ progC-IL is an ownership-safe C-IL program according to an initial
ownership oc,
• AC-IL, AroC-IL, AMIPSP and AroMIPSP are defined as in Section 5.3.4 and
Section 3.2.2 and om is a MIPSP ownership consistent with oc,
then there exists a step function s and an execution sequence c0, c1, . . . , cs(|β|)
such that for all consistency points hk the simulation relation holds between hk and
cs(k) for all running threads (Figure 6.8). Furthermore, there exists an ownership
o′m, such that the MIPSP execution from h0 to hk is ownership-safe according to
om and o
′
m.
∀h0, β, oc, om, cˆ0, apic0.
schedXT (h
0, β)
∧ ∀i ∈ Pid : ¬guest(h0.corei)
∧ emptyapicMIPSP (h0)
∧ emptyapicCC -IL+IPI (apic0)
∧ ∀i ∈ Pid. CPpi,θC-IL(cˆ0(i))
∧ c0 = cil2chw(cˆ0, apic0)
∧ ∀i ∈ Pid. SRMIPSPCC -IL+IPI (c0, info, h0, i)
∧ safeprogpi,θCC -IL+IPI (c0, oc)
∧ consisOXT (oc, om)
=⇒ ∃s. ∀k ≤ |β|. CPMIPSP (h0, β, k) =⇒
(∀i ∈ Pid. running i(β, k) =⇒ CPpi,θC-IL(chw2cil(cs(k))(i))
∧ SRMIPSPCC -IL+IPI (cs(k), info, hk, i))
∧ (∃o′m. safeseqXT (h0, β[0 : k − 1], om, o′m))
Proof We prove the simulation between d and c by induction on the number of
consistency points reached cpr. In the proof we match the current consistency
point index cpr to configuration hk.
Induction base cpr = 0
In this case we set k = 0.
CPMIPSP (h
0, β, 0) follows from the definition of hardware consistency points.
From the hypothesis we have:
∀i ∈ Pid. SRMIPSPCC -IL+IPI (c0, info, h0, i).
6.4. CC-IL+IPI SIMULATION 181
Hence, we choose s(0) = 0. All running threads are in a consistency point since
chw2cil(cil2chw(c)) = c.
We choose o′m = om. The safety for an empty sequence
safeseqXT (h
0, ε, om, om)
is defined by the ownership invariant of om. The ownership invariant of om is
guaranteed by the hypothesis as conjunct of our ownership consistency predicate.
Induction step cpr → cpr + 1
In the induction step we consider inductively the execution between two consecu-
tive consistency points hk and hl, k < l. The hypothesis of our theorem is defined
on the initial state and does not depend on our induction parameter. Thus for
proving the claim of the theorem for l we can rely on the hypothesis and on the
claim for k, i.e. the consistency after the first k steps of the abstract hardware
machine.
∃s, k < |β|. CPMIPSP (h0, β, k) =⇒
(∀i ∈ Pid. running i(β, k) =⇒ CPpi,θC-IL(chw2cil(cs(k))(i))
∧ SRMIPSPCC -IL+IPI (cs(k), info, hk, i))
∧ ∃okm. safeseqXT (h0, β[0 : k − 1], om, okm)
We execute in this setting the steps between the current (k) and the next (l =
nextCP(h0, β, k)) consistency point and define the steps of the C-IL machine
such that the following holds.
∃s(l) ≥ s(k), l < |β|. CPMIPSP (h0, β, l) =⇒
(∀i ∈ Pid. running i(β, l) =⇒ CPpi,θC-IL(chw2cil(cs(k))(i))
∧ SRMIPSPCC -IL+IPI (cs(l), info, hl, i))
∧ ∃o′m. safeseqXT (h0, β[0 : l], om, o′m)
We do a case split on the type of the consistency point hk:
Case 1: hk is a guest consistency point (guestIP(h, β, k)).
This means that the next step βk is either a guest step or VMEXIT.
In this case the next consistency point is the subsequent state, i.e. l =
k + 1. Both steps are not visible on the C-IL level, thus s(l) = s(k).
Both steps do not change relevant parts of the hardware state. From
that follows that the execution of a guest step or VMEXIT maintains our
consistency.
SRMIPSPCC -IL+IPI (c
s(k), info, hk, i) =⇒ SRMIPSPCC -IL+IPI (cs(l), info, hl, i)
182 CHAPTER 6. CC-IL+IPI SEMANTICS
Guest steps and VMEXIT are safe and do not change the ownership.
Thus we choose o′m = okm.
safeseqXT (h
0, β[0 : k − 1], om, o′m)
∧ safestepXT (hk, βk, o′m, o′m)
=⇒ safeseqXT (h0, β[0 : l − 1], o, o′)
We recall that if the C-IL compiler consistency is defined with a case
distinction on the processor mode, in system mode we consider the real
registers and in guest mode we would have to talk about the saved regis-
ters. Such consistency relation requires a more sophisticated ISA model
than MIPSP . It would provide semantics for guest instruction execu-
tion, guest/hypervisor mode switch and an intercept model. To maintain
consistency during guest steps we then have to guarantee:
• that guest steps do not change the memory regions for the stack
and for the saved hypervisor context and
• that guest steps do not change the APIC state.
For VMEXIT we would have to guarantee:
• that it restores properly the stored context and
• that it does not change the APIC state.
Details of this argumentation are given in [PBLS16].
Case 2: hk is a hypervisor consistency point, i.e. hypCPC -IL+IPI (h
0, i, β, k).
We have two possible cases:
• The program thread is running (i.e. the ISR bit is not set) and
there is no pending IPI. Thus the next step is a program step of the
hypervisor.
• The interrupt thread is running (i.e. the ISR bit is set). Thus the
next step is a program step is executing ipihandler .
We are considering extended IP schedules, thus we know that between
the current and next consistency points are executed either only program
steps of the hypervisor or only ipihandler steps. We want to apply the
C -IL+IPI compiler consistency theorem for the whole execution block
starting at hk and ending in the next consistency point. For that we have
first to show that the hypothesis of Theorem 6.1 is satisfied. Then we
use the theorem to derive our proof goal.
We apply the theorem with obvious instantiations of variables, e.g. the
initial CC -IL configuration is cˆ = chw2cil(cs(k)).
6.4. CC-IL+IPI SIMULATION 183
• The first four conditions form the compiler correctness theorems’s
hypothesis are trivially satisfied.
• From the induction hypothesis we know that all running threads
are at a consistency point. We are examining steps of proces-
sor processor i. This implies that i is running in k and therefore
CPpi,θC-IL(chw2cil(c
s(k))(i)) holds.
• The C -IL+IPI consistency between cs(k) and hk for all running
threads follows from the C -IL+IPI simulation relation between
cs(k) and hk.
• The absence of JISR and eret instruction steps is satisfied by the
definition of the current proof-case
Thus we can apply the C -IL+IPI compiler correctness theorem (Theo-
rem 6.1) and claim C -IL+IPI consistency for all running threads between
cs(l) and hl.
For the APIC consistency we have to consider possible changes due to
writes to Aapic . Such steps are I/O steps and the considered block may
only contain a single I/O step executed at the end. Therefore we need
another case split on the last instruction executed by the hypervisor. Be-
fore looking into the two cases we recall that according to the C -IL+IPI
compiler correctness theorem writes to Aapic in compiled code correspond
to function calls to sendipi on the C -IL+IPI level.
• If βl−1 is not a write to Aapic , we know that the APIC state of
the machines hk and hl is equivalent. On the C -IL+IPI level the
APIC state stays also unchanged and hw -consis holds trivially and
respectively consistency holds.
• If βl−1 is a write to Aapic , the current statement in cs(k) can only be
a function call to sendipi . The execution of the statement changes
the APIC state in the C -IL+IPI configuration the same way like
the execution of βl−1 does this on the processor. After the two
steps are executed hw -consis holds again.
Concerning the ownership safety we need to consider that the safety
conjunct in the claim of the Theorem 6.1 is an implication. Thus first we
have to prove safety of the C-IL steps between chw2cil(cs(k)) and cs(l).
From safety of the program safeprogpi,θCC -IL+IPI (c
0, o), which is part of
our hypothesis, we can deduce safety for every step, which implies safety
for the execution that we consider. Knowing that the C-IL steps are safe
we get safety of the hardware steps.
Case 3: We are at a consistency point before an IPI step, i.e. ipi(βk).
184 CHAPTER 6. CC-IL+IPI SEMANTICS
From the guard of this transition we know that in hk exists a processor
with set DS bit. Due to the simulation relation the same holds on the
C-IL level, therefore we can execute in c the IPI step.
Then l = k + 1 and cs(l) is obtained from cs(k) according Definition 6.9.
cs(l) = cs(k)[apic 7→ sendipi(c.apic, i)]
The IPI step is safe and changes on both machines the APIC state equiv-
alently letting everything else including the ownership unchanged. Thus
hw -consis is maintained and everything else follows from the induction
hypothesis.
Case 4: We are in a consistency point with an external interrupt request. The
next consistency point is the one placed by the compiler at the beginning
of the execution of the C-IL handler.
We define cs(l) by executing a JIPISR step and apply Lemma 6.2 which
also guarantees the ownership safety execution.
Case 5: We are in a consistency point before the return statement of the C-IL
function ipihandler. βk is a core step and executes the first instruction
of the return code of the function, i.e. the code of the last statement
h.pci = info.stmtcode(ipihandler , |pi.F(ipihandler).P| − 1).
The next consistency point is the one after the eret step.
During the service routine on both machines the APIC in service bits are
set.
On C-IL we execute the return statement according to our extended se-
mantic which pops the current frame from the stack and clears the in-
service flag. We apply Lemma 6.3 obtaining consistency and safety of
the MIPS execution.
4
Chapter 7
Conclusion
In this thesis we describe a model and theory, how interrupts can be integrated in
a pervasive verification approach for system software written in C. In the Verisoft
XT project we used semantics, which should be considered incomplete, since they
do not contain interrupts. The verification tool VCC is designed and developed
for verifying multithreaded C programs. Verification of system software is more
complex and requires adjustments of the tool’s computational logic. In VCC the
program thread and the handler thread are considered as two parallel C threads.
Hardware parts are defined as hybrid objects, which are saved as ghost objects but
do not follow the restriction, that information cannot flow from ghost to program
memory. VCC proofs in this methodology and especially an IPI verification as
described in [ACHP10] are incomplete and based on a series of assumptions. In
this work we addressed and closed some of the gaps in the underlying theory.
• To the best of our knowledge we present the first C semantics with inter-
rupts occurring during execution of the C-portion of the program1. The
communication protocol from [ACHP10] and the implementation of AHV
and HyperV use a bit vector, which served in our verification as an ad hoc
abstraction of the local APIC. With the CC -IL+IPI semantics we are able
to replace this bit vector with a representation that fits more precisely to the
hardware model. Moreover CC -IL+IPI allows us to talk about interrupts
on the C level.
• We have proven a simulation and the transfer of ownership-safety between
our new C semantics and a realistic hardware model with reordered execu-
tions.
• To justify the reordered schedule and enable the transfer of ownership-
safety to arbitrary interleaved executions we have proven an order reduction
theorem to reorder handler executions to consistency points.
1Interrupts occurring during the execution of assembly portions were already treated in
[PBLS16].
185
186 CHAPTER 7. CONCLUSION
Since the VCC logic can interleave threads only at the boundaries between
C-statements, any handler verification in VCC has to assume that interrupts
occur only at such boundaries. With the reordering theory presented in this
thesis we have justified this assumption.
• This thesis gives detailed justification for a VCC based code verification ap-
proach of C code with interrupts. The presented semantics as an extension
of the computational model of the code-verifier reduces the need of meta
arguments in system verification and contributes to the soundness of the
verification tool.
• In the appended publication (Chapter 8) we present a generic and modular
approach to specifying and verifying parallel interprocess communication,
that allows the recovery of procedural abstraction. Furthermore the ap-
proach was applied in the verification of real IPI code (i.e. in an industrial
and in an academic hypervisor).
As future work VCC has to be extended with a mechanism, which will ensure
that the program thread and the handler thread run exclusively. Such an extension
is still missing, but we have defined semantics against which it has to be proven
sound.
Chapter 8
Appendix: Modular
Specification and Verification of
Interprocess Communication
In this Chapter we present a join work on a specification and verification method
for Interprocess Communication (IPC). The results have been presented at the
2010 Formal Methods in Computer Aided Design conference (FMCAD10) and
published in the conference’s proceedings [ACHP10]. The paper was written by
Eyad Alkassar, Ernie Cohen, Mark Hillebrand and Hristo Pentchev. All authors
contributed to the specification of the generic and modular approach described in
Section 8.4. The application of the method in the verification of the TLB Flush
Example (Section 8.5) and IPIs in the Verisoft XT academic hypervisor and in the
Microsoft’s Hyper-VTM Hypervisor (Section 8.6) was mostly done by the author
of this thesis.
Abstract
The usual goal in implementing IPC is to make a cross-thread procedure
call look like a local procedure call. However, formal specifications of IPC
typically talk only about data transfer, forcing IPC clients to use additional
global invariants to recover the sequential function call semantics. We pro-
pose a more powerful specification in which IPC clients exchange knowledge
and permissions in addition to data. The resulting specification is polymor-
phic in the specification of the service provided, yet allows a client to use IPC
without additional global invariants. We verify our approach using VCC, an
automatic verifier for (suitably annotated) concurrent C code, and demon-
strate its expressiveness by applying it to the verification of a multiprocessor
flush algorithm.
187
188
CHAPTER 8. APPENDIX: MODULAR SPECIFICATION AND VERIFICATION
OF INTERPROCESS COMMUNICATION
8.1 Introduction
Procedural abstraction - the ability for the caller of a procedure to abstract a
procedure call to a relation between its pre- and poststates - is one of the most
important structuring mechanisms in all of programming methodology. The cen-
tral role of procedural abstraction is reflected in the fact that it is built into not
only all modern imperative languages, but also into most program logics and
verifiers for such languages. However, in a concurrent or distributed system, pro-
cedure calls between threads are provided only indirectly through system calls or
libraries for interprocess communication (IPC). This begs the question of how
such libraries might be specified so as to provide procedural abstraction to their
clients, and how such libraries can be verified to meet these specifications. In this
paper, we consider the problem in the context of multithreaded C software, with
threads executing in a single shared address space.
To see why this problem is nontrivial, consider a simple implementation where
all data is passed through shared memory, and where each ordered pair of caller-
callee threads share a mailbox at a fixed address. The caller makes a call by
creating a suitable call record in memory (including identification of which pro-
cedure to execute, values of the call parameters, and a place to put the return
value), writes the address of this record into the mailbox going to the callee, and
calls an IPC function to signal the callee. The callee, on receiving the signal,
reads the address of the call record from the mailbox, reads the memory to get
the call parameters, executes the call, and signals the caller. Note that all memory
accesses are sequential; the only synchronization necessary is provided by the IPC
layer.
Now, it’s not hard to see that the IPC layer is providing functionality similar to
a split binary semaphore, with the call records playing the role of the lock-protected
data, and the data invariant given by the semantics of the various procedure calls.
Thus, a specification for semaphores would provide a natural starting point for
a specification for IPC. However, in classical program verification, semaphore
operations are specified by their effect on global ghost state; making use of such
a specification requires additional global invariants to capture how the clients use
each semaphore. Using this kind of specification for IPC would force the client of
the remote procedure call to use these global invariants on both call and return.
This fails to faithfully capture the local character of procedural abstraction.
A second possibility is to encapsulate these global invariants inside the IPC
layer. For example, the IPC specification could be strengthened to include the
pre- and post-conditions of the procedure call. This is the sort of specification one
would find in a local logic, such as concurrent separation logic (CSL). But such
logics typically cannot specify generic semaphores, because the semaphore code
has to be polymorphic in both the encapsulated data and the data invariant.1
1A recent proposal [DDG+10] extends CSL with a facility similar to VCC ghost objects,
which should allow to do constructions similar to the one in this paper.
8.2. RELATED WORK 189
Similarly, taking this approach with the IPC code requires the specification of the
code to be polymorphic in the specification of the material being passed between
caller and callee.
We propose a different approach to specifying and verifying IPC that allows the
recovery of procedural abstraction. The key idea is that IPC routines transfer ghost
objects that own the call records, and whose invariants capture the pre- and post-
conditions of the procedures. (This is possible because we allow object invariants
to mention arbitrary parts of the state, with a semantic consistency check that
guarantees the stability of each object invariant while the object exists.) The
“contract” between caller and callee is expressed in ghost data as a binary relation
between call objects and return objects. The IPC routines can transfer ownership
of the ghost objects without knowing their types, making the transport suitably
polymorphic. This ghost scaffolding, combined with the (fixed) specification of the
IPC routines, yields for the client the sequential procedural abstraction provided
by the application function.
We have used this approach to specify and verify an IPC layer, and illustrate
its application to a multiprocessor flush algorithm. The implementation was de-
rived from a real verification target, the inter-processor interrupt (IPI) routines of
Microsoft’s Hyper-VTM hypervisor. All specification and proofs given here have
been carried out using VCC, an automatic verifier for (suitably annotated) con-
current C.2 VCC provides the first-class ghost objects needed to carry out our
approach, while allowing the approach to be applied to real implementation code.
8.2 Related Work
Related Work - IPC The correctness of IPC has been tackled in the context
of microkernel verification. For example, the IPC implementations of the seL4
[KEH+09] and VAMOS [DDW09] kernels have been formally verified against their
respective ABIs. These projects focused on implementation correctness rather
than client usability, and specify solely data transfer.
The application of VAMOS IPC provided in [ABP09] shows the shortcomings
of this approach: there, correctness statements of the remote procedure calling
(RPC) library argue simultaneously on the sender/receiver pair instead of using
thread-local reasoning.
A number of formalisms were applied to specification and verification of in-
terprocess communication in the context of the RPC-Memory Specification Case
Study [BMS96]. None of the submitted solutions attempted to provide general-
purpose sequential procedural abstraction.
In [FSGD09] a verification framework for threads and interrupt handlers based
on CSL is described. This work is similar to ours, as both the implementa-
tion of (thread-switching) primitives and clients using them, are verified. When
threads switch, ownership is transfered and some global invariant on shared data
2Sources are available at http://www.verisoftxt.de/PublicationPage.html.
190
CHAPTER 8. APPENDIX: MODULAR SPECIFICATION AND VERIFICATION
OF INTERPROCESS COMMUNICATION
is checked. In contrast to our work the client code is interactively verified in
two different logics, whereas in our approach both are verified seamlessly and
automatically in the same proof context.
Overview The paper is structured as follows. In Section 8.3 we outline main
VCC concepts. In Section 8.4 we present an IPC algorithm with polymorphic
specification, which we use in Section 8.5 to implement and verify a TLB flush
protocol. In Section 8.6 we extend these results to multiple senders and receivers
as required for the implementation of interprocessor interrupt (IPI) protocols used
in real, multiprocessor hypervisors. In Section 8.7 we conclude.
8.3 VCC Overview
In this section, we give a brief overview of VCC. More detailed information and
references can be found through the VCC homepage [Mic09]. To understand
the VCC view of the world, it is helpful to think of verification in a pure object
model, which is used to interpret the C memory state. Thus, we first describe
VCC concepts in terms of objects, and then describe how this is applied to C.
Table 8.1 shows a syntax overview of the constructs required for our IPC
design presented in the following sections.
Objects In VCC, the state is partitioned into a collection of objects, each with
a number of fields. Objects have addresses, so fields can be (typed) object refer-
ences. Each object has a 2-state invariant, which is expected to hold over any state
transition. These invariants can mention arbitrary parts of the state. However,
when checking an atomic update to the state, instead of checking the invariants
of all objects we want to check the invariants of only the updated objects. We
justify this by checking, for each object type, that starting from a state in which
all object invariants hold, a transition that breaks the invariant of an object of
that type must break the invariant of some modified object (not necessarily of
that type); such invariants are said to be admissible. (In addition, we have to
check that stuttering from the poststate of a transition preserves all invariants of
all objects.) Both requirements are checked for each object type when the type
is defined; this check makes use of type definitions, but not of program code.
Details can be found in [CMST10].
Within an object invariant, the (2-state) invariant of other objects can be
referred to.3 A commonly used form of this is approval : we say that an object o
approves changes to another object’s field p→f, if p has a 2-state invariant stating
that p→f stays unchanged or the invariant of o holds. In other words, any change
to p→f requires checking the invariant of o. Approval is used to express object
dependencies or build object hierarchies, e.g., VCC’s ownership model.
3This implicitly makes object invariants recursive; to guarantee that all object invariants have
a consistent interpretation, we allow such references to occur only with positive polarity.
8.3. VCC OVERVIEW 191
VCC Keyword Description
Basics
me reference to current thread
this self-reference to object (used in type invariants)
invariant(p) type invariant with property p
old(e) evaluates e in prestate (of function, loop, or 2-state
invariant)
closed(o) object o closed; invariants of o guaranteed to hold
inv(o) evaluates to (2-state) invariant of o
approves(o, f1, . . . ,fn) changes of fields f1, . . . ,fn require check of o’s invari-
ant:
(
∨
i old(fi) 6= fi) =⇒ inv(o)
atomic(o1, . . . ,on){s;} marks atomic execution of s; updates only volatile
fields of o1, . . . ,on
ref cnt(o) number of claims that depend on o
claims(c,p) invariant of claim c implies p
spec(. . .) wraps ghost code and parameters
∀(T t; . . .) universal quantification
∃(T t; . . .) existential quantification
Ownership
owner(o) owner of object o
owns(o) set of objects owned by object o
wrapped(o) o closed and owned by current thread
mutable(o) o not closed and owned by current thread
set closed owner(o,o’) sets owner of o to o’ and extends ownership of o’ by
o
giveup closed owner(o,o’)make o wrapped and remove it from the ownership of
o’
Function Contracts
requires(p) precondition
ensures(p) postcondition
writes(o1, . . . ,on) function writes to objects oi
Spec Types
mathint mathematical integers
claim t claim pointers
T2 map[T1] map from T1 to T2
λ(T1 t1; . . .) lambda expression over t1
Table 8.1: VCC Keywords
192
CHAPTER 8. APPENDIX: MODULAR SPECIFICATION AND VERIFICATION
OF INTERPROCESS COMMUNICATION
Since it is unrealistic to expect objects to satisfy interesting invariants always
(e.g., before initialization or during destruction), we add to each object a Boolean
ghost field closed indicating whether the object is in a “valid” state. Implicitly,
the 2-state invariants declared with an object type are meant to hold only across
state transitions in which the object is closed in the prestate and/or the poststate.
Each object field is classified as either sequential or volatile. Volatile fields can
change while the object is closed, while sequential fields cannot. (That is, for
each sequential field, there is an implicit object invariant that says that the field
does not change while the object is closed.)
Each object has an owner, which is itself an object. It is a global system
invariant that open objects are owned only by threads, which are regular objects.
In the context of a thread t, a closed object owned by t is said to be wrapped,
while an open object owned by t is said to be mutable. Threads themselves have
invariants; essentially, the invariant of a thread t says that any transition that
does not change the state of t leaves unchanged 1. the set of objects owned by t,
2. the fields of its mutable objects, 3. the sequential fields of its wrapped objects,
and 4. the (volatile) fields of closed objects approved by t (we call such fields
thread-approved). Each object o implicitly contains an invariant that says that
its owner (as well as its owner in the prestate) approves any change to the field
o→closed and to the set of objects owned by o.4
The sequential domain of a closed object is the smallest set of object fields
that includes the sequential fields of the object and, if its set of owned objects
is declared as nonvolatile, the elements of the sequential domains of the objects
that it owns. Intuitively, the values of fields in the sequential domain of o are
guaranteed not to change as long as o remains closed.
Within program code, each memory access is classified as ordinary or atomic.
An ordinary write is allowed only to fields of mutable objects; an ordinary read is
allowed only to fields of mutable objects, to nonvolatile fields of in the sequential
domain of a wrapped object, and to volatile fields of objects that are closed if
changes to the field are approved by the reading thread. In an atomic operation,
all of the objects accessed have to be known to be mutable or closed (i.e., not
open and owned by some other thread), only volatile fields of closed objects may
be written, and the update must be shown to preserve the invariants of all updated
objects. Before each atomic operation, VCC simulates running other threads by
forgetting everything it knows about the state outside of its sequential domain;
standard reduction techniques [CL98] can be used to show that we can soundly
ignore scheduler boundaries at other locations.
Ghost Objects VCC verifications make heavy use of ghost data and code (sur-
rounded by spec()), used for reasoning about the program but omitted from
concrete implementation. VCC provides ghost objects, ghost fields of structured
4By default, the set of objects owned by o is nonvolatile, and so cannot change while o is
closed. This can be overridden by declaring vcc(volatile owns) in the type definition of o.
8.3. VCC OVERVIEW 193
data types, local ghost variables, ghost function parameters, and ghost code. C
data types are limited to those that can be implemented with bit strings of fixed
length, but ghost data can use additional mathematical data types, e.g., mathe-
matical integers (mathint) and maps. VCC checks that information cannot flow
from ghost data or code to non-ghost state, and that all ghost code terminates;
these checks guarantee that program execution including ghost code simulates
the program with the ghost data and ghost code removed.
Claims A ghost object can be used as a first-class chunk of knowledge about
the state, because the invariant of the object is guaranteed to hold as long as the
object is closed. In particular, the owner of the object does not have to worry
about the object being opened by the actions of others, so it can make use of the
object invariant whenever it needs it. Being a first-class object, the chunk can be
stored in data structures, passed in and out of functions, transfered from thread to
thread, etc. Because they are so useful, VCC provides syntactic support for these
chunks of knowledge, in the form of claims. Claims are similar to counting read
permissions in separation logic [BCOP05], but are first-class objects; this allows
claims to approve changes, be claimed, or even claim things about themselves.
Typically, a claim depends on certain other objects being closed; it is said
to “claim” these objects. Since objects are usually designed to be opened up
eventually, these “claimed” objects must be prevented from opening up as long
as the claim is closed. Concretely, this can be implemented in various ways, the
most obvious being for the dependee to track the count ref cnt(o) of claims that
claim o, and allowing o to be opened only when ref cnt(o) is zero, cf. [CMST10].
In constructing a claim, the user provides the set of claimed objects and invariant
of the claim; VCC checks that this invariant holds and is preserved by transitions
under the assumption that the claimed objects are closed (this check corresponds
to the admissibility check if the claim was declared with an explicit type). Any
predicate implied by this invariant is said to be “claimed” by the claim; this allows
a client needing a claim guaranteeing a particular fact to use any claim that claims
this fact (without having to know the type of the claim); to make this convenient,
VCC gives all claims the same type (claim t); we can think of an additional
“subtype” field as indicating the precise invariant.
Function Contracts and Framing Verification in VCC is function-modular ;
when reasoning about a function call, VCC uses the specification of the function,
rather than the function body. A function specification consists of preconditions
(of the form requires(p)), postconditions (of the form ensures(p), where p is a
2-state predicate, the prestate referring to the state on function entry), and writes
clauses (of the form writes(o), where o is an object reference or a set of object
references). VCC generates appropriate verification conditions to make sure that
the writes clauses are not violated.
194
CHAPTER 8. APPENDIX: MODULAR SPECIFICATION AND VERIFICATION
OF INTERPROCESS COMMUNICATION
Binding to C The discussion above assumed that we are in a world of unaliased
objects. To deal with the real C memory state, VCC maintains in ghost state a
global variable called the typestate that keeps track of where the “real” objects are;
these objects correspond to instances of C aggregate types (structs and unions).
(Variables of primitive types that are not fields of such objects are put into artificial
ghost objects or ghost arrays.) There are system invariants that 1. each memory
cell is part of exactly one object in the typestate, 2. if a struct is in the typestate,
then each of its subobjects (e.g., fields of aggregate type) are in the typestate,
and 3. if a union is in the typestate, then exactly one of its subobjects is in the
typestate. These invariants guarantee that if two objects overlap, then they are
either identical or one object is a descendant of the other in the object hierarchy.
When an object reference is used (other than as the target of an assignment), it is
asserted that reference points to an object in the typestate. Thus, the typestate
gets rid of all of the “uninteresting” aliasing (like objects of the same type partially
overlapping).
8.4 A Polymorphic Specification of IPC
In this section we verify the implementation of a simple communication algorithm
between two threads. The threads exchange data over a shared but sequentially
accessed message box to which they synchronize access with a Boolean volatile
notification flag. To verify the implementation’s memory safety, an ownership
discipline must be realized in which the ownership of the message box is trans-
ferred back and forth between the two threads. We extend this pattern by passing
claims between the two threads, which we store in the message box. The prop-
erties of these claims can be configured by the clients, thus providing the desired
polymorphic procedure call semantics for IPC.
There are various ways to structure annotations and, in particular, the defini-
tions of ghost objects and invariants. At their core, all of these share information
via volatile fields, pass on knowledge via claims or object invariants, and make
use of thread-approved state for the two communication partners. We chose here
a way that is easy to present but also extends cleanly to multiple senders and
receivers (cf. Section 8.6).
Scenario We consider the scenario of two threads (0 and 1) exchanging data
over a shared message box (of type MsgBox). The message box contains two fields
(in and out) which are used for sending a request to the other thread and receiving
back a response, respectively. The fields of the message box are nonvolatile and
accessed sequentially. The message box is contained in another structure (of
type Mgr), which additionally holds a volatile Boolean notification flag n used
to synchronize access to the message box. Given the canonical conversion of
Booleans to integers (where 0 and 1 are mapped to false and true, respectively),
this flag identifies the currently acting thread. If set, thread 1 is acting, i.e.,
8.4. A POLYMORPHIC SPECIFICATION OF IPC 195
spec(typedef struct vcc(record) InOut {
unsigned val; mathint gval; claim t cl; } InOut;)
typedef struct MsgBox {
unsigned in, out;
spec(InOut input, output;)
invariant(input.val≡ in ∧ output.val≡ out)
invariant(input.cl6= output.cl ∧
input.cl ∈ owns(this) ∧ ref cnt(input.cl)≡ 0 ∧
output.cl ∈ owns(this) ∧ ref cnt(output.cl)≡ 0) } MsgBox;
Listing 1: Message Box Type with Invariants
preparing a response for thread 0 and posting a new request, and thread 0 may
not access the message box. Otherwise, thread 0 is acting and thread 1 may not
access the message box. Thread 0 may not clear the flag, and thread 1 may not
set it.
The implementation has two functions. Both take a Mgr pointer and a thread
identifier a. The function snd() is meant to be called by thread a when the noti-
fication flag equals a. It negates the notification flag, thus sending the response
and a new request contained in the message box at that time to the other thread.
The function rcv() waits in a busy loop until the notification flag equals a again,
thus receiving the other thread’s response (to a preceding snd() call) and a new
request.
Message Box Listing 1 shows the annotated definition of the message box
type. As outlined above, we want to generalize information exchange to beyond
the mere transferral of data (the fields in and out in the message box). We
therefore define an abstract I/O type (InOut) that carries a ghost value gval of
unbounded integer type, and a claim pointer cl in addition to the implementation
data value val being transmitted.
An abstract input and output each are an invariant stored in ghost fields of
the message box. We maintain an invariant that the input’s and output’s val
fields match their implementation counterpart. We also require that the claims
pointed to by the input’s and output’s cl fields do not alias and are owned by the
message box with a zero reference count.
The latter fact is particularly important. Whoever owns the message box also
controls the contained claims, and may make use of the knowledge / property they
hold or destroy them. The main functionality of the verified algorithm is thus the
transferal of ownership of the message box between the two threads, making sure
that the contained data has the desired properties, as instantiated by the client.
Actors The Actor type keeps track of the protocol state of a protocol partic-
ipant. Listing 2 shows the annotated definition of this type. The actor has a
nonvolatile pointer mgr to the manager, which will hold all protocol invariants.
196
CHAPTER 8. APPENDIX: MODULAR SPECIFICATION AND VERIFICATION
OF INTERPROCESS COMMUNICATION
spec(typedef struct vcc(volatile owns) Actor {
struct Mgr ∗mgr;
volatile bool w;
volatile InOut l input, r input;
invariant(closed(this) ∨ ¬closed(mgr))
invariant(approves(owner(this), w, l input, r input))
invariant(approves(mgr, owns(this), w, l input, r input)) } Actor;)
Listing 2: Actor Type and Invariants
typedef struct Mgr {
volatile bool n;
MsgBox msgBox;
spec(Actor A[2];
bool InP[bool][InOut];
bool OutP[bool][InOut][InOut];)
invariant(∀(unsigned a; a < 2 =⇒ closed(&A[a]) ∧ A[a].mgr≡ this))
invariant(A[¬n].w)
invariant(A[n].w
? &msgBox ∈ owns(&A[n]) ∧ OutP[n][A[n].l input][msgBox.output] ∧
A[¬n].l input≡ msgBox.input ∧ InP[n][msgBox.input]
: A[n].r input≡ A[¬n].l input) } Mgr;
Listing 3: Manager Type and Invariants
For admissibility reasons, the actor must promise to stay closed longer than mgr.
All others fields are volatile and may be atomically updated while the actor remains
closed. Such updates, however, must be approved by two parties: the manager
mgr, which checks all the protocol invariants, and the owner of the actor, which
is one of the communicating threads and exclusive writer of the fields. The actor
is also used as an intermediate owner of the message box during ownership trans-
ferral. For this purpose, its owns set is also declared volatile as well as approved
by mgr but not thread-approved, to enable foreign updates by other threads.
The three regular fields of the actor are used as follows. The wait flag w
is active when the thread owning the actor is waiting for a response from the
other thread. The fields l input and r input buffer (abstract) local and remote
inputs, i.e., input to the last request sent to or received from the other thread
(or, in other words: the evaluation of the call parameters from the caller’s and
callee’s perspective, respectively). In contrast to the input fields of the message
box itself, which may be opened and updated sequentially by the owning thread,
these buffers can be admissibly referred to all the time and used in the protocol
invariants.
Manager Listing 3 shows the annotated declaration of the Mgr type. In addi-
tion to the implementation fields, we also add some ghost components for the
verification: the maps InP and OutP encoding pre- and postconditions for the
message exchange, and a two-element array A of actors.
8.4. A POLYMORPHIC SPECIFICATION OF IPC 197
&A[0] &A[1]
c1c0
claims
approves
ownsmgr
msg
Thread 1Thread 0
2. 3.4.1.
Figure 8.1: Object Structure and Ownership Transfer
The predicates are declared nonvolatile, which allows clients to deduce that
they remain unchanged as long as the manager object is closed. They take a
Boolean parameter identifying the actor and one resp. two abstract input-output
values. The intention is that InP[a][i] is true iff i is a valid request for thread a
(i.e., if the request meets the precondition of the service), and OutP[a][i][o] is
true iff o is a valid response to a (valid) request i made by thread a (i.e., if the
response meets the postcondition corresponding to the call). The IPI transport
code is polymorphic with respect to these predicates, the concrete definition of
which can be provided by the client at initialization.
We now describe the manager’s invariants. As described above, each protocol
partner a owns its corresponding actor &A[a]. The first invariant states that both
actors remain closed and point back to the manager, which (in combination with
the actor’s approval invariants) allows us to admissibly talk about the actors in
invariants here.
The remaining invariants define the protocol behavior. For an overview, refer
to Figure 8.1 depicting object structure and a protocol run starting from thread 0.
In addition to the objects already introduced, each (client) thread i owns a claim ci
that guarantees the manager structure to be closed. In phase 1, thread 0 owns
the message box and may prepare its response and new request. In phase 2,
ownership of the message box has passed from thread 0 to the actor of thread 1,
waiting to be processed. Phases 3 and 4 are symmetrical: in phase 3 thread 1
prepares its response, which is then waiting to be processed in phase 4.
In addition to ownership, the protocol invariants restrict values for the actor
fields. The second invariant states the non-acting thread, identified by the negated
notification flag, must be waiting, i.e., have the wait flag of its actor set.
The third invariant refers to the acting thread, given by the notification flag.
The fact that the acting thread is waiting indicates that the message box is still
waiting to be processed by the acting thread. It holds a response to the acting
thread’s last request in the output field and a new request in the input field. In the
corresponding invariant we state that 1. the message box is owned by the current
actor, 2. its output is valid with respect to the acting thread’s last / locally-stored
198
CHAPTER 8. APPENDIX: MODULAR SPECIFICATION AND VERIFICATION
OF INTERPROCESS COMMUNICATION
void snd(struct Mgr ∗mgr, bool a spec(claim t c))
requires(wrapped(c))
requires(claims(c,closed(mgr)))
requires(wrapped(&mgr→A[a]))
ensures(wrapped(&mgr→A[a]))
requires(¬mgr→A[a].w)
requires(wrapped(&mgr→msgBox))
requires(mgr→OutP[¬a][mgr→A[a].r input][mgr→msgBox.output])
requires(mgr→InP[¬a][mgr→msgBox.input])
writes(&mgr→msgBox,&mgr→A[a])
ensures(mgr→A[a].l input≡ old(mgr→msgBox.input))
ensures(mgr→A[a].w)
{
atomic (c, mgr, &mgr→A[0], &mgr→A[1]) {
assert(¬mgr→A[a].w ∧ mgr→A[¬a].w ∧ mgr→n≡ a);
mgr→n = ¬a;
spec(mgr→A[a].l input = mgr→msgBox.input;
mgr→A[a].w = true;
set closed owner(&mgr→msgBox, &mgr→A[¬a]);
bump vv(&mgr→A[a]); /∗ technicality ∗/
)
}
}
Listing 4: Send function with contract
request, and 3. the new input equals the local input buffer of the other thread and
is valid for the acting thread. If the acting thread is not waiting, we require local
and remote input buffers of the current and non-current actors, respectively, to
match. Note that these input buffers are approved by the acting and non-acting
threads, respectively. Thus, this condition states that request inputs may not be
changed while the request has not yet been processed.
Operations The (annotated) implementation and the contracts for the send
and receive function are given in Listings 4 and 5. Both functions take a manager
pointer mgr, an actor identifier a, and a claim c supplied as a ghost parameter
stating that the manager is closed. They maintain that the identified actor is
wrapped. To send to the other thread, the current thread’s actor must be flagged
as non-waiting, the message box must be wrapped and hold valid outputs and
inputs to the other thread, just as we have seen in the manager invariant for
the acting thread. Afterwards, the message box is unknown to be wrapped (the
writes clause on &mgr→msgBox destroys that knowledge), but the input sent to
the other thread is buffered in the local input field of the actor (and the current
thread’s actor is flagged as waiting).
Given a waiting actor, the receive function is guaranteed to return a wrapped
message box, that contains a valid response for the old local request and a new
valid request.
8.4. A POLYMORPHIC SPECIFICATION OF IPC 199
void rcv(struct Mgr ∗mgr, bool a spec(claim t c))
requires(wrapped(c))
requires(claims(c,closed(mgr)))
requires(wrapped(&mgr→A[a]))
ensures(wrapped(&mgr→A[a]))
requires(mgr→A[a].w)
writes(&mgr→A[a])
ensures(¬mgr→A[a].w)
ensures(wrapped(&mgr→msgBox))
ensures(mgr→OutP[a][old(mgr→A[a].l input))][mgr→msgBox.output])
ensures(mgr→A[a].r input≡ mgr→msgBox.input)
ensures(mgr→InP[a][mgr→msgBox.input])
{
unsigned tmp;
do
invariant(mgr→A[a].w)
invariant(wrapped(&mgr→A[a]))
invariant(mgr→A[a].l input≡ old(mgr→A[a].l input))
atomic (c, mgr, &mgr→A[0], &mgr→A[1]) {
tmp = mgr→n;
spec(if (tmp≡ a) {
mgr→A[a].r input = mgr→A[¬a].l input;
mgr→A[a].w = false;
giveup closed owner(&mgr→msgBox, &mgr→A[a]);
bump vv(&mgr→A[a]); /∗ technicality ∗/
})
}
while (tmp 6= a);
}
Listing 5: Receive function with contract
As a verification example consider the snd() function from Listing 4. VCC au-
tomatically verifies that its implementation fulfills the contract. The code consists
of a single atomic update on the actors and the manager (where the closedness of
the manager and the foreign actor is guaranteed by the claim c). The precondition
on the wait flag, its thread-approval, and the manager’s invariant allow to derive
that the current thread is still not waiting, the other thread is waiting, and the
notification flag equals a just before the atomic operation.5 Also, the message
box, which is in the sequential domain of the thread, must still be wrapped and
continues to satisfy the communication preconditions. The notification flag is
then flipped (changing the ‘acting’ thread) and the ghost updates ensure that the
atomic update satisfies the manager’s invariant (e.g., by transferring ownership of
the message box from the current thread to the other thread’s actor).
The verification of the rcv() function is similar. In addition to the atomic
statement, appropriate invariants have to specified for the loop that polls on the
notification flag.
5The assertion is for illustration only; VCC deduces it automatically.
200
CHAPTER 8. APPENDIX: MODULAR SPECIFICATION AND VERIFICATION
OF INTERPROCESS COMMUNICATION
spec(typedef struct Tlb {
volatile mathint invalid, current;
invariant(invalid ≤ current ∧
old(invalid) ≤ invalid ∧ old(current) ≤ current)
} Tlb;)
Listing 6: TLB Model
8.5 TLB Flush Example
We implement and verify a protocol for flushing translation look-aside buffers
(TLBs) based on the communication algorithm from the previous section, demon-
strating the expressiveness of its polymorphic specification.
TLBs are per-processor hardware caches for translations from virtual to phys-
ical addresses. These translations are defined by page tables stored in memory,
which are asynchronously and non-atomically gathered by the TLBs (requiring
multiple reads and writes to traverse the page tables). Since translations are not
automatically flushed in response to edits to page tables, operating systems must
implement procedures to initiate such flushes on their own.
We think of page-table reads being marked with unique (increasing) identifiers
and model each TLB as an object with two volatile counters,6 cf. Listing 6. The
current counter increases as the TLB gathers new translations. The invalid counter
is a watermark for invalidated translations and is bumped (i.e., copied from the
current field) when the associated processor issues a TLB flush.
Consider the scenario of two threads, the caller (thread 0) requesting the flush
and the callee (thread 1) performing the flush. We implement this as follows: the
caller sends a flush request by invoking the send primitive and subsequently polls
for the answer by calling the receive primitive. On callee side, the thread polls
via receive for new flush requests. When a flush request has been received, the
callee issues a TLB flush operation, and signals back that the flush has been
performed using the send primitive. After a completed flush operation, the flush
client (e.g., the memory manager) wants to derive that the callee TLB’s current
invalid counter is larger or equal than the callee’s current counter at the time of
the flush operations.
We realize this scenario by embedding the IPC manager (and callee’s TLB)
into a flush manager, as shown in Listing 7. Apart from ownership, the invariants
give meaning to the input and output predicates of the communication manager.
The ghost value i.gval transmitted from the caller to the callee encodes which
translations are meant to be flushed. For the callee (a≡ true), the input predicate
states that this value is less or equal than the current field of its TLB (since the
callee could not possibly flush translation ‘from the future’, i.e., such a request
could not be handled by the TLB flush semantics). For the caller (a≡ false), the
6While this model is sufficiently detailed to express the semantics of (full) TLB flushes,
extensions are needed for applications that go beyond that.
8.6. INTERPROCESSOR INTERRUPTS 201
typedef struct FlushMgr {
struct Mgr mgr;
spec(struct Tlb tlb;)
invariant(&tlb ∈ owns(&mgr)))
invariant(mgr.InP≡ λ(bool a; InOut i;
a =⇒ claims(i.cl, i.gval ≤ tlb.current)))
invariant(mgr.OutP≡ λ(bool a; InOut i, o;
a ∨ claims(o.cl, i.gval ≤ tlb.invalid)))
} FlushMgr;
Listing 7: Flush Manager Type and Annotations
output predicate then states that the invalid field of the callee’s TLB is greater or
equal than the value, i.e., the requested flush has been performed. For the other
cases the input and output predicates are trivially true.
Based on this definition, the correctness of the functions sendFlush() and
receiveFlush() at caller and callee side, respectively, can be proven. The main
postcondition that is established by sendFlush() for the flush manager fmgr then
is old(fmgr→tlb.current) ≤ fmgr→tlb.invalid.
8.6 Interprocessor Interrupts
Interprocessor interrupts (IPI) are used in multicore operating systems or hyper-
visors to implement different synchronization and communication protocols. Via
IPIs a thread executing on one processor can trigger the execution of interrupt
handlers (here: NMI handlers) on other processors. Using IPIs, a communication
protocol can be implemented, in which a caller thread sends work requests to
other processors, the callees. Such an IPI protocol is part of the Verisoft XT
academic hypervisor, where it may be used for different work types, e.g., for TLB
flushing. Thus a polymorphic specification is desirable.
By expanding the simple communication pattern introduced previously, we
specified and verified the IPI protocol (and on top of it a TLB flushing protocol)
for the academic hypervisor. There are several differences between the previous
version of the algorithm and the IPI protocol:
• More communication partners. In the simple case we had a single sender
and a single receiver. Now we have multiple communication partners, where
one sender may invoke an IPC call on many receivers, and where each
receiver may be invoked by many senders at the same time.
• No receiver polling. The callees in the IPI scenario do not poll for mes-
sages. Rather the caller invokes the callee by triggering an IPI. This is
done by writing registers of the advanced programmable interrupt controller
(APIC), which delivers the interrupts to other processors. In the work at
hand we do not yet model this hardware device.
202
CHAPTER 8. APPENDIX: MODULAR SPECIFICATION AND VERIFICATION
OF INTERPROCESS COMMUNICATION
• More concurrency. In the new setting we have another source of concur-
rency, NMI handlers which may interrupt the execution of ordinary threads.
Basically, the NMI handler code always acts as receiver or callee and the
thread code as sender or caller.
• Interlocked hardware operations. Interlocked bit operations are required
to atomically access bit vectors which may be written and read concurrently
by many threads/handlers.
Implementation Since multiple senders can send requests to multiple receivers,
we need a notification bit for each sender/receiver pair. This is implemented
by introducing one notification mask per processor. Each bit of such a mask is
associated with a specific sending processor. Thus, a sender signals a request by
setting its bit in the receiver’s notification mask. When finishing the work, the
receiver clears that bit. Many senders and receivers can write the same notification
mask in parallel, requiring the use of interlocked bit operations.
Similarly, we need one mailbox for each sender/receiver pair. Note, that for
each processor pair we need two mailboxes, since both may send messages to each
other simultaneously.
In the sending code a while-loop iterates over the set of intended receivers
(encoded in a bit mask). In each iteration, first the mailbox is prepared, and then
by using an interlocked OR-operation, atomically, the corresponding bit in the
receiver mask is set to 1 and the mask is compared with 0. If this check evaluates
to true, an IPI for the receiving processor is triggered via the APIC. Otherwise,
nothing has to be done, since some other sender already triggered the interrupt,
and the handler has not returned yet.
In the receiving code (implemented as an NMI handler) a while-loop iterates
on (possibly multiple) sender requests as long as the receiver’s notification mask
is not 0. Once the work for one sender is done, the corresponding bit in the
notification mask is cleared by an interlocked AND-operation.
Specification The specification pattern of Section 8.4 can be straightforwardly
applied to the IPI protocol. The number of ghost objects scales linearly with
the number of processors. The structure and the invariants of message boxes
(with their ghost fields encoding input/output claims) and actors introduced in
the simple protocol can be reused almost identically in the new setting.
If n is the number of processors, 2 · n actor objects are required, since each
processor may act both as sender—when running thread-code—or as receiver—
when running NMI handler code. Though executed on the same processor, both
code portions are two logically different entities, possibly residing in different
protocol states, and owning different sets of mailboxes. That is also how we deal
with thread and NMI handler concurrency: each of the NMI handler and the
thread code own (and thus approve) separate actors. Note that in the IPI case, a
single actor may communicate with many other partners, requiring it to maintain
8.7. CONCLUSION 203
protocol state (the wait flag, and the remote and local input fields) per processor.
The invariants of the manager are similar to those from the simple protocol.
Multiprocessor TLB Flush The TLB abstraction and specification is similar to
the previous section, but with a separate TLB for each processor.
IPIs in Microsoft’s Hyper-VTM Hypervisor In the context of the Verisoft
project we also studied the correctness of the IPI mechanism implemented in
Microsoft’s Hyper-V hypervisor. Though comparable in complexity to the IPI
routine of the academic hypervisor, there are several differences:
• Efficiency. By introducing additional protocol variables sequential access to
some of the shared data can be ensured, and thus fewer (costly) interlocked
operations are required.
• Lazy work. The interrupt handler signals the receipt of the request and
the accomplishing of the work separately. This allows for implementing less
blocking caller code.
We have verified the implementation against a non-generic specification in VCC
and are confident that this effort can be easily adapted to the generic specification
used here.
8.7 Conclusion
The verification presented here achieves the desired goal— it allows IPC clients
to reason about IPCs like local procedure calls. As future work, the structure
presented in Section 8.4 can be made modular even with respect to the set of
functions provided via IPC. We can improve the structure slightly by changing the
Mgr type; instead of the maps InP and OutP, the Mgr could hold a mapping of
function tags to function objects, where each function object has its own InP and
OutP maps. This would allow function objects to be reused in different managers,
or even dynamically registered for IPC.
In principle, the technique presented here could also be applied to RPC, where
the caller and callee execute in different address spaces. This requires translating
the claims representing the pre- and post-conditions from one address space to the
other. One possible way to achieve this effect would be to take the claim in the
caller space, couple this to a second state in a way that captures the guarantees
of the RPC, and existentially quantify away the caller space.

Index
Notation
N, 4
N+, 4
B, 4
MIPSP
CM , 10
CMEM , 12
CCPU , 10
CCPU , 64
CCORE , 13
CAPIC , 23
CTLB , 29
loadAapic , 32
storeAapic , 32
δcore, 22
δeret, 22
δinstr, 18
δjisr, 21
δmem, 13
δtlb, 29
∆n, 37
h
β→ h′, 37
eretIPI , 31, 50
I, 32
iint , 31
jisr IPI , 47
isJISR, 21
ms, 32
Ownership
AMIPSP , 46
AroMIPSP , 47
O, 43
ownership-inv , 43
reads, 47
safeacc, 46
safeseqMIPSP , 48
safestep, 48
transfer , 45
safetransfer , 45
writes, 48
Ownership XT
OXT , 66
ownership-invXT , 67
safeaccXT , 71
safeseqXT , 73
safestepXT , 73
safetransferXT , 68
Reordering
eqEi , 79
eqLi , 76
eqLHi , 77
eqShi , 79
Lstepi, 76
Lsteps i, 76
IOstepHV , 50
IOstep, 50
IOIPCondition, 55
IPCB i, 75
Aipcb , 75
Aipcb i, 75
ipcbba, 75
ipcb, 75
regi, 75
iai, 76
IPp , 80
IPsched , 55
ISRblock , 82
reoISRbi, 83
205
206 INDEX
ISRseq , 80
ISRseqXT , 82
ISRV , 86
stepsi , 84
stepsI/O , 85
schedJIP , 80
schedXT , 80
schedXT
i, 80
C-IL
bytes2valθ, 124
CCC -IL, 131
c.Mi, 126
c.fi, 126
c.loci, 126
c.rdsi, 126
contextC-IL, 122
dropframe, 126
frameC-IL, 121
framenew, 127
incloc, 127
is-function, 127
rmwpi,θc , 123
setloc, 127
setrds , 127
s.Mi, 132
s.fi, 132
s.loci, 132
s.rdsi, 132
stmtnext, 127
top(c), 126
TP , 115
TC , 116
TQ, 117
sizeθ, 123
qt2t , 117
val2bytesθ, 124
Compiler
AC-IL, 146
AroC-IL, 146
baC-IL, 142
consisCC -IL, 145
consiscodeCC -IL, 143
consiscontrolC-IL , 144
consisglobalCC -IL, 144
consis localC-IL , 145
consismemCC -IL, 143
consisO, 146
consispbpC-IL, 144
consisregsC-IL , 144
consisstackC-IL , 145
consisvarC-IL, 144
CPpi,θC-IL, 134
CPMIPSP , 135
hypCPC-IL, 134
distC-IL, 141
evolpi,θc , 147
infoCC -IL, 138
infoC-IL, 140
iosteppi,θ, 148
nextCP , 135
pbpC-IL, 142
raC-IL, 142
running , 135
safeseqpi,θC-IL, 149
safesteppi,θC-IL, 149
volpi,θc , 148
C-IL+IPI
h
β−−→
l, i
h′, 163
CC -IL+IPI , 153
CCC -IL+IPI , 154
chw2cil, 154
cil2chw, 154
consisCC -IL+IPI , 170
consiscontrolC -IL+IPI , 168
consis localC -IL+IPI , 170
consisstackC -IL+IPI , 170
consisvarC -IL+IPI , 169
CPMIPSP , 174
hypCPC -IL+IPI , 173
ecodeV , 167
emptyapicCC -IL+IPI , 175
emptyapicMIPSP , 175
hw -consis, 173
hw -step, 155
pcodeV , 166
preIPIH V , 175
INDEX 207
safeprogpi,θCC -IL+IPI , 161
safeseqpi,θCC -IL+IPI , 160
safeseqpi,θC -IL+IPI , 160
safesteppi,θC -IL+IPI , 159
SRMIPSPCC -IL+IPI , 173

Bibliography
[ABP09] E. Alkassar, S. Bogan, and W. Paul. Proving the correctness of clien-
t/server software. In Sadhana Journal, volume 34, pages 145–192.
Springer, 2009.
[ACHP10] Eyad Alkassar, Ernie Cohen, Mark A. Hillebrand, and Hristo Pentchev.
Modular specification and verification of interprocess communication.
In Proceedings of 10th International Conference on Formal Methods
in Computer-Aided Design, FMCAD 2010, Lugano, Switzerland, Oc-
tober 20-23, pages 167–174, 2010.
[Alk09] Eyad Alkassar. OS Verication Extended - On the Formal Verication
of Device Drivers and the Correctness of Client/Server Software. PhD
thesis, University of Saarland, 2009.
[App12] Andrew W. Appel. Verified software toolchain. In NASA Formal
Methods - 4th International Symposium, NFM 2012, Norfolk, VA,
USA, April 3-5, 2012. Proceedings, page 2, 2012.
[Bau14] Christoph Baumann. Ownership-Based Order Reduction and Simula-
tion in Shared-Memory Concurrent Computer Systems. PhD thesis,
Saarland University, Saarbru¨cken, 2014.
[BCOP05] Richard Bornat, Cristiano Calcagno, Peter W. O’Hearn, and
Matthew J. Parkinson. Permission accounting in separation logic.
In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, POPL 2005, Long Beach, Cal-
ifornia, USA, January 12-14, 2005, pages 259–270, 2005.
[BMS96] Manfred Broy, Stephan Merz, and Katharina Spies, editors. Formal
Systems Specification, The RPC-Memory Specification Case Study
(the book grow out of a Dagstuhl Seminar, September 1994), volume
1169 of Lecture Notes in Computer Science. Springer, 1996.
[CL98] Ernie Cohen and Leslie Lamport. Reduction in TLA. In CONCUR
’98: Concurrency Theory, 9th International Conference, Nice, France,
September 8-11, 1998, Proceedings, pages 317–331, 1998.
209
210 BIBLIOGRAPHY
[CMST10] Ernie Cohen, Michal Moskal, Wolfram Schulte, and Stephan Tobies.
Local verification of global invariants in concurrent programs. In Com-
puter Aided Verification, 22nd International Conference, CAV 2010,
Edinburgh, UK, July 15-19, 2010. Proceedings, pages 480–494, 2010.
[DDG+10] Thomas Dinsdale-Young, Mike Dodds, Philippa Gardner, Matthew J.
Parkinson, and Viktor Vafeiadis. Concurrent abstract predicates. In
ECOOP 2010 - Object-Oriented Programming, 24th European Confer-
ence, Maribor, Slovenia, June 21-25, 2010. Proceedings, pages 504–
528, 2010.
[DDW09] Matthias Daum, Jan Do¨rrenba¨cher, and Burkhart Wolff. Proving fair-
ness and implementation correctness of a microkernel scheduler. J.
Autom. Reasoning, 42(2-4):349–388, 2009.
[Deg11] Ulan Degenbaev. Formal Specification of the x86 Instruction Set Ar-
chitecture. PhD thesis, Saarland University, Saarbru¨cken, 2011.
[DPS09] Ulan Degenbaev, Wolfgang J. Paul, and Norbert Schirmer. Pervasive
theory of memory. In Susanne Albers, Helmut Alt, and Stefan Na¨her,
editors, Efficient Algorithms – Essays Dedicated to Kurt Mehlhorn on
the Occasion of His 60th Birthday, volume 5760 of Lecture Notes in
Computer Science, pages 74–98. Springer, Saarbru¨cken, 2009.
[FSGD09] Xinyu Feng, Zhong Shao, Yu Guo, and Yuan Dong. Certifying low-
level programs with hardware interrupts and preemptive threads. J.
Autom. Reasoning, 42(2-4):301–347, 2009.
[Heb10] Erik Hebisch. A survey of multicore in the german software
developers community. Fraunhofer IAO, http://www.ipd.uni-
karlsruhe.de/multicore/separs/downloads/2010-12-02-SEPARS-
Stuttgart-IAOMWare-Surveys.pdf, December 2010.
[Inc06] The Santa Cruz Operation Inc. System v application binary interface -
mips risc processor supplement 3rd edition. Technical report, February
2006.
[KEH+09] Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick,
David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt,
Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and
Simon Winwood. sel4: formal verification of an OS kernel. In Pro-
ceedings of the 22nd ACM Symposium on Operating Systems Princi-
ples 2009, SOSP 2009, Big Sky, Montana, USA, October 11-14, 2009,
pages 207–220, 2009.
[KLM+15] Daniel Kroening, Lihao Liang, Tom Melham, Peter Schrammel, and
Michael Tautschnig. Effective verification of low-level software with
BIBLIOGRAPHY 211
nested interrupts. In Proceedings of the 2015 Design, Automation &
Test in Europe Conference & Exhibition, DATE ’15, pages 229–234,
San Jose, CA, USA, 2015. EDA Consortium.
[KMP14] Mikhail Kovalev, Silvia M. Mueller, and Wolfgang J. Paul. A Pipelined
Multi-core MIPS Machine - Hardware Implementation and Correctness
Proof. Springer, 2014.
[Kov13] Mikhail Kovalev. TLB Virtualization in the Context of Hypervisor
Verification. PhD thesis, Saarland University, Saarbru¨cken, 2013.
[KP95] Jo¨rg Keller and Wolfgang J. Paul. Hardware Design: Formaler Entwurf
digitaler Schaltungen. Teubner, 1995.
[Lei08] Dirk Leinenbach. Compiler Verification in the Context of Pervasive
System Verification. PhD thesis, Saarland University, Saarbru¨cken,
2008.
[Ler09] Xavier Leroy. Formal verification of a realistic compiler. Communica-
tions of the ACM, 52(7):107–115, 2009.
[Lev99] John R. Levine. Linkers and Loaders. Morgan Kaufmann Publishers
Inc., 1st edition, 1999.
[LP08] D. Leinenbach and E. Petrova. Pervasive compiler verification – from
verified programs to verified systems. In 3rd intl Workshop on Systems
Software Verification (SSV08)., volume 217C of Electronic Notes in
Theoretical Computer Science, pages 23–40. Elsevier Science B. V.,
2008.
[LPP05a] D. Leinenbach, W. Paul, and E. Petrova. Towards the formal veri-
fication of a c0 compiler: Code generation and implementation cor-
rectness. In 3rd International Conference on Software Engineering and
Formal Methods (SEFM 2005), Koblenz, Germany, 2005.
[LPP05b] D. Leinenbach, W. Paul, and E. Petrova. Towards the formal veri-
fication of a C0 compiler: Code generation and implementation cor-
rectness. In 3rd International Conference on Software Engineering
and Formal Methods (SEFM 2005), 5-9 September 2005, Koblenz,
Germany, 2005.
[LT93] Nancy G. Leveson and Clark Savage Turner. Investigation of the
therac-25 accidents. IEEE Computer, 26(7):18–41, 1993.
[Mic09] Microsoft Corporation., http://vcc.codeplex.com/. VCC: A C Veri-
fier., 2009.
212 BIBLIOGRAPHY
[MP00] Silvia M.Mu¨ller and Wolfgang J. Paul. Computer Architecture: Com-
plexity and Correctness. Springer, 2000.
[Pau07] Wolfgang Paul. System Architecture (Lecture Notes). URL:
http://busserver.cs.uni-sb.de/lehre/vorlesung/
info2/ss08/material/mitschrift07.pdf, 2007.
[PBLS16] Wolfgang Paul, Christoph Baumann, Petro Lutsyk, and Sabine
Schmaltz. System architecture as an ordinary engineering discipline.
URL: http://www-wjp.cs.uni-saarland.de/lehre/
vorlesung/rechnerarchitektur/ws15/books/sysbook.
pdf, 2016.
[Pra95] Vaughan R. Pratt. Anatomy of the pentium bug. In TAPSOFT’95:
Theory and Practice of Software Development, 6th International Joint
Conference CAAP/FASE, Aarhus, Denmark, May 22-26, 1995, Pro-
ceedings, pages 97–107, 1995.
[PSS12] Wolfgang Paul, Sabine Schmaltz, and Andrey Shadrin. Completing
the automated verification of a small hypervisor - assembler code
verification. In George Eleftherakis, Mike Hinchey, and Mike Hol-
combe, editors, 10th International Conference, SEFM 2012, Thessa-
loniki, Greece, October 1-5 (Software Engineering and Formal Meth-
ods), volume 7504 of Lecture Notes in Computer Science, pages 188–
202. Springer Berlin Heidelberg, 2012.
[Sch13] Sabine Schmaltz. Towards the Pervasive Formal Verification of Multi-
Core Operating Systems and Hypervisors Implemented in C. PhD
thesis, Saarland University, Saarbru¨cken, 2013.
[Sha12] Andrey Shadrin. Mixed Low- and High Level Programming Languages
Semantics. Automated Verification of a Small Hypervisor: Putting It
All Together. PhD thesis, Saarland University, Saarbru¨cken, 2012.
[SS12] Sabine Schmaltz and Andrey Shadrin. Integrated semantics of
intermediate-language c and macro-assembler for pervasive formal ver-
ification of operating systems and hypervisors from verisoftxt. In Ra-
jeev Joshi, Peter Mu¨ller, and Andreas Podelski, editors, 4th Inter-
national Conference on Veried Software: Theories, Tools, and Ex-
periments, VSTTE’12, volume 7152 of Lecture Notes in Computer
Science, Philadelphia, USA, 2012. Springer Berlin / Heidelberg.
[Sta] Statista GmbH, http://www.statista.com.
[Sta10] Artem Starostin. Formal Verification of Demand Paging. PhD thesis,
Saarland University, SaarbrA˜14cken, 2010.
BIBLIOGRAPHY 213
[Tsy09] Alexandra Tsyban. Formal Verification of a Framework for Microkernel
Programmers. PhD thesis, Saarland University, Saarbru¨cken, 2009.
[Ver] Verisoft Consortium., http://www.verisoft.de. Verisoft Project.
