Formal virtualization requirements for the ARM architecture by Penneman, Niels et al.
Formal virtualization requirements for the ARM architecture
Niels Pennemana,b,1,, Danielius Kudinskasc, Alasdair Rawsthornec, Bjorn De Suttera, Koen De Bosscherea
aComputer Systems Lab, Ghent University, Sint-Pietersnieuwstraat 41, B-9000 Gent, Belgium
bVrije Universiteit Brussel (VUB), Dept. of Electronics and Informatics (ETRO), Pleinlaan 2, B-1050 Brussels, Belgium
cSchool of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
Abstract
We present an analysis of the virtualizability of the ARMv7-A architecture carried out in the context of the seminal paper published
by Popek andGoldberg 38 years ago. Because their definitions are dated, we first extend theirmachinemodel tomodern architectures
with paged virtual memory, IO and interrupts. We then use our new model to show that ARMv7-A is not classically virtualizable.
Insights such as binary translation enable efficient virtualization beyond the original criteria. Companies are also making their
architectures virtualizable through extensions. We analyse both approaches for ARM and conclude that both have their use in future
systems.
Keywords: Binary translation, Hypervisor, Instruction set architecture, Virtualization, Virtual machine monitor
1. Introduction
The term virtualization is used in many different con-
texts, ranging from storage or network technologies to ex-
ecution environments and even to virtual realities. This
paper refers to system virtualization—technology which
allows multiple operating systems to be executed on the
same physical machine simultaneously. It achieves this
by decoupling the underlying hardware from the operat-
ing systems, which are executed deprivileged as shown
in Figure 1. These operating systems are now called guest
operating systems. In general, a guest or virtual machine
(VM) refers to all software that is executed deprivileged
and in an isolated environment under the control of a vir-







Figure 1: Overview of a virtualized system.
In 1974, Popek and Goldberg [44] wrote a classic pa-
per on “Formal requirements for virtualizable third gen-
eration architectures”. It defines strict system virtualiza-
tion criteria for computer architectures, and proves that
Email addresses: Niels.Penneman@elis.UGent.be (Niels
Penneman), kudinsd6@cs.man.ac.uk (Danielius Kudinskas),
Alasdair.Rawsthorne@manchester.ac.uk (Alasdair Rawsthorne),
Bjorn.DeSutter@elis.UGent.be (Bjorn De Sutter),
Koen.DeBosschere@elis.UGent.be (Koen De Bosschere)
1Niels Penneman is supported by the agency for Innovation by Sci-
ence and Technology (IWT).
if they are met, an “efficient” VMM can be constructed
for that architecture. Since its publication, the paper has
been used as a solid reference point for designing hard-
ware platforms capable of supporting an efficient VMM
and it has become the groundwork of virtualization tech-
nology.
The criteria defined by Popek and Goldberg are now
known as the conditions for classic virtualizability. Ar-
chitectures that meet these criteria are particularly suited
for full virtualization. Full virtualization (also known as
pure or faithful virtualization) is a technique in which the
VMM presents a virtual platform that is an exact replica
of the real hardware to each of its guests. Hence, applica-
tions and operating systems do not require any modifica-
tion to support virtualization.
Popek and Goldberg based their model on comput-
ers available at the time, such as DEC PDP-10 and IBM
360/67. Due to advances in microprocessor architecture,
their model no longer fits current architectures. Fur-
thermore, new approaches in the construction of efficient
VMMs that do not fit Popek and Goldberg’s model and
criteria [24], have enabled virtualization of many more
contemporary architectures.
Virtualization in the server and desktop world has al-
ready matured, with both software and hardware solu-
tions available for several years [1, 14, 16, 45, 48, 50, 53].
However, virtualization on embedded systems has only
been explored for the past seven years, and is an area of
ongoing research [22, 29]. Solutions for data centres and
desktop computing cannot be readily applied to embed-
ded systems, because of differences in requirements, use
cases, and computer architecture. We provide a new per-
spective on the Popek and Goldberg virtualizability re-
quirements. We develop a model that focuses on modern
microprocessors and use it to analyse the ARM architec-
ture. We illustrate that ourmodel can also help in the con-
struction of VMMs based on dynamic binary translation
(DBT) by using our analysis to show the pitfalls for con-
structing a DBT VMM on ARM.
ARM is by far the leading architecture in the embed-
ded and mobile market [47], and recently multiple soft-
ware virtualization solutions have been developed for it,
such asCodezero, OKL, RedBendVLX,VIRTUS,VMware
MVP and Xen-ARM [11, 12, 15, 28, 31, 34, 37]. Due to ar-
chitectural problems none of the currently deployed solu-
tions offers full virtualization [51].
ARMhas already published a preliminary specification
of extensions to its architecture, which will facilitate full
virtualization [8]. The appearance of hardware platforms
that implement these extensions is imminent. It is there-
fore important to understand the problems faced by virtu-
alization technology onARM today, and evaluate the pos-
sible solutions against the formal requirements derived
by Popek and Goldberg [44] nearly 40 years ago. Popek
and Goldberg also proposed techniques to virtualize ar-
chitectureswhich did notmeet their criteria. VMMsbased
on DBT, which do not require changes to the architec-
ture, can be regarded as an evolution of their techniques.
Their analysis therefore remains useful for the construc-
tion of such VMMs and for understanding the advan-
tages and disadvantages of both the HW supported and
DBT supported approaches. On the one hand, hardware
extensions enable simple VMM implementations but are
tied to specific architectures and platforms. On the other
hand, DBT is notorious for complicating VMM implemen-
tations, but is known to be versatile and usable in situa-
tions where hardware extensions would be infeasible or
impractical.
This paper analyses the application profile of the cur-
rent ARMarchitecture, ARMv7-A, because it serves as the
base for the upcoming virtualization extensions [8]. Our
model excludes micro-controllers for deeply embedded
systems. Those architectures often lack advanced mem-
ory management features and hence they are incapable of
running full-blown operating systems.
Our contributions include:
• an update to the model of Popek and Goldberg with
paged virtual memory, IO, and interrupts, and the
application of such model to analyse the ARMv7-A
architecture both without andwith the virtualization
extensions;
• an example of how an analysis according to our
model helps in the construction of a DBT VMM;
• a discussion on the trade-offs between the use of ar-
chitectural extensions for virtualization and DBT, in
whichwe argue that both have their use in future sys-
tems.
The rest of this paper is organized as follows: Section 2
discusses our motivation and gives a quick overview of
the model introduced by Popek and Goldberg. In Sec-
tion 3we extend thismodel to include paged virtualmem-
ory, IO, and interrupts. We use the updated model to
show that the ARM architecture is not classically virtual-
izable in Section 4. In Section 5we analyse ARM’s upcom-
ing hardware support and discuss how ARM can be fully
virtualized using DBT. We show how our analysis from
Section 4 helps in the construction of a DBT VMM. We
then compare the use of hardware extensions with DBT-
based approaches on a theoretical level.
2. Background and motivation
2.1. Existing virtualization solutions
Paravirtualization is the only virtualization technique
deployed in today’s ARM-based embedded systems. In
paravirtualization, a VMM presents a custom interface to
its VMs which is similar, but not identical to the underly-
ing hardware [54].
Despite the popularity of paravirtualization, none of
the efforts to standardize the required VMM interfaces
has gained sufficient momentum to spread to more than
a few operating systems or VMMs [3, 38, 46]. As a con-
sequence, the majority of contemporary embedded oper-
ating systems do not support such interfaces out of the
box. Instead, VMM vendors and third parties must pro-
vide source code patch sets to support specificVMMinter-
faces for specific operating system versions. This situation
comes with four major drawbacks:
1. Developing, maintaining, and testing patch sets for
each and every combination of a specific operating
system version and a VMM interface is an expensive
process. Although semantic patches may offer a so-
lution to simplify patch management [10], the effort
required for testing remains.
2. Patched operating systems may exhibit unexpected
behaviour because their reliability is not guaranteed
and patches may introduce new security issues.
3. Licenses may prevent or restrict modifications to op-
erating systems source code, and often impose rules
on the distribution of patch sets or patched code.
4. Previously certified software stackswill need to be re-
certified after patching. The recertification process is
expensive and always specific to a particular VMM
interface, thereby stimulating vendor lock-in.
This analysis is shared by major players in industry in-
cluding ARM, Nokia and STMicroelectronics [19].
2.2. Classic virtualizability
The drawbacks of paravirtualization can be avoided by
using full virtualization. With full virtualization, guest
operating systems run unmodified on top of a VMM. This
also enables virtualization of a priori unknown software.
Popek and Goldberg [44] formally derived sufficient (but
2
not necessary) criteria that determine whether an archi-
tecture is suitable for full virtualization. In this section,
we briefly explain their model and results.
The machine model used by Popek and Goldberg is de-
liberately simplified. It includes a processor and a linearly
addressable memory, but does not consider interaction
with IO devices and interrupts.
The processor operates in either supervisor mode or
user mode. Supervisor mode is a privileged mode, meant
for the operating system, while user mode is unprivi-
leged. The processor also features a program counter and
a relocation-bounds register, used for relativememory ad-
dressing. An instruction set architecture (ISA) for such a
processor canmove, look up or process data and alter pro-
gram control flow. Based on this description, they defined
the concept of machine state.
Definition 1. The machine state S is defined by the con-
tents of the memory E, the processor mode M , the pro-
gram counter P , and the relocation-bounds register R:
S ≡ 〈E,M,P,R〉 .
Since all variables that determine the machine state are
finite, the set of all machine states Σ is also finite.
Definition 2. An instruction i is a function on the set of
machine states Σ that maps one state to another:
i : Σ −→ Σ
Sx 7−→ i(Sx) = Sy.
All instructions are classified in three categories:
• Privileged instructions execute correctly in privileged
mode, and always trap in unprivileged mode.
• Sensitive instructions are further classified as control-
sensitive and behaviour-sensitive instructions. Control-
sensitive instructions attempt to modify the proces-
sor execution mode or the amount of available mem-
ory resources. Behaviour sensitivity manifests itself
in two ways. The result of behaviour-sensitive in-
structions either depends on their location in physi-
cal memory (location sensitivity) or on the processor
execution mode (mode-sensitivity).
• Innocuous instructions are those instructions that are
not sensitive.
Upon a trap, control is transferred to a privileged mode,
P will point to a pre-defined trap handler, and the system
state before the trap is saved in thememoryE, such that it
can be restored when the instruction that caused the trap
has been dealt with.
Popek and Goldberg also defined three fundamental
properties for virtualized systems:
1. Efficiency: all innocuous instructions are executed
natively without VMM intervention;
2. Resource control: guest software is forbidden access
to physical state and resources;
3. Equivalence: guest software behaves identical2 to
when it is run on a system natively.
Using the above definitions, Popek and Goldberg
proved the following theorem:
Theorem 1 (Popek and Goldberg [44]). For any conven-
tional third generation computer, a virtual machine monitor
may be constructed if the set of sensitive instructions for that
computer is a subset of the set of privileged instructions.
On an architecture on which all sensitive instructions
are also privileged, a trap is generated and caught by the
VMMwhenever a guest attempts to execute a sensitive in-
structions. Such instructions must then be “interpreted”
by the VMM. All other instructions (innocuous instruc-
tions) must be executed natively. This kind of VMM is
also known as an execute-to-trap VMM. It is this ability to
execute most of the code directly that enable “efficient”
virtualization.
Popek and Goldberg also showed that if the efficiency
property is loosened, allowing interpretation of all priv-
ileged guest code, a so-called hybrid VMM can be con-
structed for otherwise non-virtualizable architectures.
To specify the conditions for the construction of a hy-
brid VMM, Popek and Goldberg defined a new subset of
sensitive instructions: user-sensitive instructions. An in-
struction is user-sensitive if there exists a state in unpriv-
ileged mode for which the instruction is sensitive. Using
the categories defined earlier, the user-sensitive instruc-
tions can further be classified as user-control-sensitive and
user-location-sensitive instructions3. Popek andGoldberg
then proved the following theorem:
Theorem 2 (Popek and Goldberg [44]). A hybrid virtual
machine monitor may be constructed for any conventional third
generation machine in which the set of user-sensitive instruc-
tions are a subset of the set of privileged instructions.
Alternatively, DBT can be used to rewrite instructions
at run time at an acceptable performance cost—some re-
searchers have proposed dynamic optimization solutions
that achieve a performance benefit by overcoming funda-
mental limitations of static compilation [13]. If the condi-
tions for the construction of a hybrid VMM are met, only
privileged guest code must be rewritten. Otherwise, full
virtualization can still be achieved by rewriting all instruc-
tions, i.e., including unprivileged guest code. Using hy-
brid VMMs and DBT, a much wider range of computer
architectures can be virtualized.
2The equivalence property only holds under the assumption that
guest software is free of timing dependencies.
3There are no user-mode-sensitive instructions as the processormode
is limited to the one (and only) unprivileged mode.
3
2.3. Prior updates to the model
Dong andHao [20] have attempted to extend the model
by Popek and Goldberg [44] to include interrupts and
memory-mapped IO. Because they treat the subject from
the view of user-space applications rather than the op-
erating system or the underlying computer architecture,
they exclude IO operations initiated by operating systems.
Their model introduces a set of possible IO states Γ, sim-
ilar to the set of machine states Σ, and implicitely re-
defines instructions as functions with domain and range
Σ × Γ. However, they do not revise the original defini-
tions from Popek and Goldberg at all even though they
alter the model on which those definitions are based. It is
also unclear whether the proposed model can be applied
in practice. Interrupts and exceptions are vital for today’s
computer software, not only because of IO, but also be-
cause they enable communication between operating sys-
tems and their applications through software interrupts
(system calls), and because they are key to modern mem-
ory management techniques such as swapping. However,
Dong and Hao only consider interrupts and exceptions in
the context of IO. Their treatment of the subject is there-
fore incomplete.
At the time of writing, we are not aware of any other
extensions to the model.
2.4. Advances in computing practice
Over the past 40 years, computer architectures have
been significantly extended, use cases have changed, and
users’ expectations have evolved accordingly. Hence, the
model by Popek and Goldberg [44] no longer fits current
computing practice.
2.4.1. Memory relocation and protection
The original model allows for a minimal form of mem-
ory relocation and protection, using a single combined
base location and bounds register. Such a register is ca-
pable of relocating non-privileged applications under the
control of an operating system running outside a virtu-
alized environment, and can relocate operating systems
themselves (and their applications) when running un-
der the control of a VMM. Although computer systems
with paged virtual memory were available commercially
in 1974 [25], virtual memory was still seen primarily as
a technique for optimizing the utilization of expensive
physical memory. Most applications were written with-
out directly depending on the virtual memory facilities of
the operating system, so a simple relocation scheme, typ-
ically found on systems without virtual memory, was ad-
equate to model the behaviour of a real system.
Today’s operating system designers, however, exploit
virtual memory to offer applications many key facilities
such as dynamic linking, shared libraries, and dynami-
cally expandable heap and stack areas. It is no longer re-
alistic to hide the details of these facilities in a discussion
of virtualization. Therefore, wewill introduce the concept
of a memory map in the definition of machine state. An
operating system will use such maps to define the mem-
ory space accessible to applications. A VMMwill also use
such maps to define the memory space accessible to its
guests. Correctness conditions for the latter are derived
in Section 3.2.
2.4.2. Timing
A second difference that has arisen is the importance of
timing in our use of computers. At the time Popek and
Goldberg [44] published their results, much of the com-
puting workload was still processed in batches, with in-
put prepared off-line and output printed for later usage.
Modern computing is significantly more time-sensitive—
much of our interaction with personal computers and
portable electronics is characterised as “soft real-time”, in
which failure to deliver a response in a timely manner is
perceived as a vexatious malfunction.
The model by Popek and Goldberg does not aim for
analysing timing-dependent behaviour. It also does not
lend itself to study the influence multiple VMs running
under the same VMM exert on each other. In this pa-
per, we will also limit the discussion to whether an archi-
tecture lends itself to virtualization in general, for which
studying the case of a single VM is sufficient, and to how
a VMM can be constructed for that architecture. As we
have stated earlier, we do not consider deeply embedded
systems. This rules out hard real-time systems.
The underlying idea of efficiency in the original pa-
per was that in an efficient system relatively few instruc-
tions would trap. This implies that if one wants to re-
tain soft real-time behaviour, VMM interceptions must be
bounded. Whether or not a VMM of which all intercep-
tions are bounded can be constructed on a specific archi-
tecture, is out of the scope of this paper.
2.4.3. IO, interrupts and exceptions
Most IO in contemporary computer architectures is ei-
ther memory-mapped (MMIO) or port-mapped (PMIO).
IO operations may result in interrupt generation, but gen-
erally this is not necessary, unlike what is suggested by
Dong and Hao [20].
In an architecture that supportsMMIO, IO device regis-
ters are mapped into the same address space as RAM.We
put forward that resource control is retained if all accesses
to IO device registers trap or can be configured to trap a
priori. The instructions that perform these accesses will
be generic load and store instructions. However, we can-
not require all load and store instructions to be privileged,
because their sensitivity depends on the address they act
upon. Load and store instructions that operate on RAM
are clearly innocuous, given that they operate on virtual
addresses. Hence, treating all load and store instructions
as sensitive violates the efficiency property. Instead, vir-
tual memory can be used to protect device memory, and
make all accesses to such memory trap. The efficiency
4
property is retained if memory protection is possible at
such granularity that accesses to RAM are not affected.
PMIO is typically used in systems where the mem-
ory address space is too small to accommodate IO de-
vices. On other architectures, it may be a relic which has
been preserved for backwards compatibility. Communi-
cation with devices is carried out using dedicated IO in-
structions. Such instructions can be seen as controlling
resources; they are hence similar to control-sensitive in-
structions: it is sufficient that all dedicated IO instructions
are also privileged to retain classic virtualizability.
3. An updated model
3.1. Machine state
We redefine themachine state concept fromDefinition 1
based on the features of modern computer architectures.
We add the complete set of general-purpose registers and
configuration registers (also known as system registers).
We introduce paged virtual memory by substituting the
relocation-bounds register with a fine-grained memory
map. We also extend the machine state with the state of
IO devices.
Definition 3. The machine state S is defined by the mem-
ory E, the processor mode M , the program counter P ,
the general-purpose registers G, the configuration regis-
ters C, the memory map A, the MMIO device state DM
and the PMIO device state DP :
S ≡ 〈E,M,P,G,C,A,DM , DP 〉 .
The first three parameters, namely the memory E, the
current processor mode M , and the program counter P ,
retain their meaning. To generalize the concept of pro-
cessor mode, we define MP to be the set of privileged
modes4, and mU to be the unprivileged mode, such that
M = (MP ∪ {mU}) and mU 6∈ MP . We also introduce
four new parameters. G contains the value of all general-
purpose registers excluding the program counter5. Sim-
ilarly, C contains the value of all configuration registers.
We let E refer to the contents of the entire physical ad-
dress space. Physical memory is now accessed using the
virtual to physical address translationmapA. Last but not
least,DM andDP refer to the state of IO devices. We omit
all IO device registers from E.
In our model, instructions that communicate with de-
vices always alter either DM or DP . We also exclude any
external influences from modifying DM and DP during
the execution of any instruction. This limitation of the
model is required to model instruction behaviour accu-
rately and independently of timing behaviour of devices.
Device state may change in between the execution of in-
structions.
4Some architectures provide more than one privileged mode. We as-
sume that all such modes are equally privileged.
5The ARM architecture offers a set of 16 “general-purpose” registers,
which contains the program counter as R15 [23].
3.2. Address mapping
The addressmapA is amany-to-onemap, meaning that
many virtual addresses can correspond to the same phys-
ical address. Each entry in A also contains an access per-
mission specifier, which may be used to restrict access to
read-only, or can forbid access altogether. Without loss
of generality, we can assume that all memory accesses are
virtual. When virtual addressing is turned off, no page ta-
bles are active and Awould be a one-to-one identity map.
Definition 4. An address map A is a set of 3-tuples
(v, p, x) inwhich v ∈ V represents a virtual address, p ∈ P
represents the physical address v is mapped to, and x ∈ X
represents the access permission specifier. For any v ∈ V ,
there is at most one 3-tuple in A that contains v.
Definition 5. Let VA be the set of virtual addresses
mapped by A:
VA = {v | v ∈ V ∧ ∃(p, x) ∈ (P,X) : (v, p, x) ∈ A } ;
the translation function TA for the address map A is then
defined as:
TA : VA −→ (P,X)
v 7−→ TA(v) = (p, x).
Upon every memory access, the address translation
function TA(v) takes the virtual address v of the access
and finds its corresponding physical address p using the
address map A. If A does not contain an entry for v or
if the access permission specifier x indicates that the re-
quested kind of access is not allowed, a memory trap oc-
curs.
An operating system may create its own set of page ta-
bles, mapping a set of virtual addresses Vg to a set of phys-
ical addresses Pg . When this operating system is virtual-
ized, its physical addresses become virtual in the context
of the VMM. The VMM is responsible for mapping guest
physical addresses to host physical addresses Pt. Let Ag
and Ah denote the sets of all valid mappings in the oper-
ating system’s page table and the VMM’s page table, re-
spectively:
∀vg ∈ Vg : ∃(pg, xg) ∈ (Pg, X), (pt, xt) ∈ (Pt, X) :
TAg (vg) = (pg, xg) ∧ TAh(pg) = (pt, xt).
A typical memory management unit (MMU) can only
perform a single address translation in hardware. A dou-
ble translation, as described above, would require trans-
lation by software which in turn would require all guest
memory accesses to trap, thereby sacrificing the efficiency
property. This problem can be solved by composing the
guest and VMM mappings into a set of direct mappings
As from guest virtual to host physical addresses. Such ad-
dressmap is also known as a shadow address map [1, 14, 48].
Creation and maintenance of shadow address maps is
one of the most complicated tasks of a VMM. The shadow
5
address map must contain all virtual addresses mapped
by the guest. Since it is common for guests to be relocated
in the physical address space by the VMM, a new formal
requirement has to be imposed on the allocator. LetEn be
the contents of the physical address space, and let Ag be
the address map that translates virtual addresses v ∈ Vg
to physical addresses inEn on a machine without a VMM
present. Let Ev and As denote the physical address space
and the address map with a VMM installed. The formal
requirement is that at any time, every virtual address vg is
mapped byAs andAg to the same physical data, or mem-
ory traps otherwise:
∀vg ∈ Vg : ∃(pg, xg) ∈ (Pg, X), (pt, xs) ∈ (Pt, X) :(
TAg (vg) = (pg, xg) ∧ TAs(vg) = (pt, xs)
⇒ Ev[pt] = En[pg]
)
∨ xs causes a memory trap. (1)
In addition to the mappings from Ag , a VMM will add
its own set of systemmappings toAs. These systemmap-
pings comprise the memory space occupied by the hyper-
visor and MMIO devices. Because the hypervisor and its
guests have to coexist in the same address space, map-
pings may overlap. This is certainly the case for MMIO
devices. It is the job of the VMM to maintain separation
of guest andVMMresources by setting appropriate access
permissions on the overlapping regions in the map As so
that unprivileged code is denied access.
3.3. Instruction behaviour
This section redefines the notions of privileged, sensi-
tive and innocuous instructions using the new machine
state model. We adopt Definition 2 that states an instruc-
tion is a function that maps one machine state to another.
We write I for the set of all instruction functions, and S
for the set of all possiblemachine statemapping functions.
Because typically not every machine state can be reached
through instructions, I ⊂ S .
Definition 6. An instruction i is privileged if
for any two states S1〈e,mU , p, g, c, a, dM , dP 〉 and
S2〈e,m2, p, g, c, a, dM , dP 〉, where m2 ∈ MP and both
i(S1) and i(S2) do not memory trap, i(S1) causes a trap
and i(S2) does not. This trap is referred to as a privileged
instruction trap.
All instructions which change the processor mode,
modify the system registers, modify the address map or
communicate with PMIO devices are control-sensitive.
We exclude communication with MMIO devices to pre-
vent treating genericmemory instructions as sensitive (see
Section 2.4.3).
Definition 7. An instruction i is control-sensitive if there
exists a state S〈e1,m1, p1, g1, c1, a1, dM , dP1 〉 and for i(S) =
〈e2,m2, p2, g2, c2, a2, dM , dP2 〉we have:
(m1 6= m2 ∨ c1 6= c2 ∨ a1 6= a2 ∨ dP1 6= dP2 )
∧ i(S) does not memory trap.
Mode-sensitive instructions behave differently when
executed in machine states which differ solely in their
mode.
Definition 8. An instruction i is mode-sensitive
if, given two states S1〈e,m1, p, g, c, a, dM , dP 〉
and S2〈e,m2, p, g, c, a, dM , dP 〉 such that for some
m1 6= m2, and i(S1) and i(S2) do not memory
trap, for i(S1) = 〈e1,m∗1, p1, g1, c1, a1, dM1 , dP1 〉 and
i(S2) = 〈e2,m∗2, p2, g2, c2, a2, dM2 , dP2 〉we have:
e1 6= e2 ∨m∗1 6= m∗2 ∨ p1 6= p2 ∨ g1 6= g2
∨ c1 6= c2 ∨ a1 6= a2 ∨ dM1 6= dM2 ∨ dP1 6= dP2 .
We introduce a new class of sensitive instructions, sim-
ilar to mode-sensitive instructions. Their behaviour de-
pends on the set of system registers S.
Definition 9. An instruction i is configuration-
sensitive if, given two states S1〈e,m, p, g, c1, a, dM , dP 〉
and S2〈e,m, p, g, c2, a, dM , dP 〉 such that for some
c1 6= c2, and i(S1) and i(S2) do not memory
trap, for i(S1) = 〈e1,m1, p1, g1, c∗1, a1, dM1 , dP1 〉 and
i(S2) = 〈e2,m2, p2, g2, c∗2, a2, dM2 , dP2 〉we have:
e1 6= e2 ∨m1 6= m2 ∨ p1 6= p2 ∨ g1 6= g2
∨ c∗1 6= c∗2 ∨ a1 6= a2 ∨ dM1 6= dM2 ∨ dP1 6= dP2 .
Popek and Goldberg also introduced the notion of
location-sensitive instructions, which were able to bypass
memory relocation. An example of a such sensitive in-
struction is the Load Real Address (LRA) instruction from
the IBM 360/67 instruction set. In a modern architecture,
such instructions would bypass address translation and
expose absolute physical addresses. In virtualized sys-
tems it is common for guests to be relocated in physi-
cal memory by the VMM. Location-sensitive instructions
would therefore break the equivalence property. How-
ever, we are not aware of any such instructions in modern
processor architectures including ARM, MIPS, PowerPC,
SuperH and even x86 [4, 32, 35, 40, 41, 42, 49].
At first sight, a new difficulty arises on architectures
that have an explicitly visible program counter. Use of
the program counter as an operand in any instruction
may break the equivalence property if a guest is relocated.
However, current memory management techniques en-
able a VMM to map guests to the same virtual address
as they would see when executed natively. Given that re-
quirement (1) in Section 3.2 is met, this will be the case,
and instructions that operate on the program counter will
have no influence on the virtualizability of a system.
3.4. Events
Multiple definitions exist for the terms interrupt and ex-
ception. OnARM, for example, interrupts are seen as a par-
ticular subclass of exceptions [4], while the x86manual de-
scribes themas different andunrelated types of events [36].
6
From this point on, we refer to the whole set of interrupts
and exceptions as events.
Events cause the processor to save its current state and
enter an event handler in a privileged mode. Events are
either synchronous or asynchronous. A synchronous event
directly results from the execution of an instruction. Syn-
chronous events are already included in our model as
traps. In modern architectures, several kinds of traps ex-
ist. They can usually be classified in one of the following
categories:
• Privileged instruction traps are the result of executing
a privileged instruction.
• Memory traps are caused by instructions or instruc-
tion fetches attempting to access an address that is ei-
ther invalid or inaccessiblewith respect to the current
machine state S and access bits X .
• Arithmetic traps may occur when attempting to per-
form invalid arithmetic operations (such as division
by zero), or when a hardware FPU lacks an imple-
mentation for a requested floating point operation, in
which case software must emulate the operation.
• Undefined instruction traps may occur when executing
an instruction which is not recognized by the proces-
sor.
An asynchronous event may happen at any time, unre-
lated to the instruction being executed. In practice, all in-
terrupts generated by IOdevices are asynchronous. When
an asynchronous event happens while the processor is ex-
ecuting an instruction, it will cause the processor to revert
or to complete that instruction. If not, the processorwould
be left in an inconsistent state. Hence, all instruction func-
tions i are independent of such events.
Definition 10. An asynchronous event is a function e that
brings a processor from one machine state into another of
which the mode is privileged, without and unrelated to
the execution of an instruction:
e : Σ −→ {S〈e,m, p, g, c, a, dM , dP 〉
|S ∈ Σ ∧m ∈MP }
Sx 7−→ e(Sx) = Sy .
Hence, asynchronous events can be expressed in the
same way as instruction execution. We write E for the
set of asynchronous event functions. Because of the con-
straint onM , E is a proper subset of S.
3.5. Result
Based on our updated model and its definitions, we
conclude that Theorem 1 remains unmodified. However,
we need to impose a new formal constraint on the alloca-
tor of the VMM (see (1) in Section 3.2) to support paged
virtual memory.
The proof of the theorem also requires a subtle update.
The consequence of adding asynchronous events to the
model is that for any instruction sequence (i1, i2, . . . in), in
which neither instruction generates a synchronous event,
we can no longer guarantee that Sk+1 = ik(Sk) (1 ≤
k < n), because asynchronous events may occur at any
time. The same observation applies to changes in de-
vice state due to timing effects or external influences. Al-
though both themodel developed by Popek andGoldberg
and our extended model do not rely on sequences of in-
structions, the proof of the theorem that builds upon this
model does. More precisely, they are used in the formal-
isation of the equivalence property as a homomorphism
on the set of machine states Σ. However, as we have
shown, asynchronous events may be modelled as func-
tions that map one machine state on another similar to
instructions. The same reasoning can be applied to asyn-
chronous changes in IO device state. It suffices to gener-
alize the notion of instruction functions i ∈ I to functions
of the set S to formalise the equivalence property under
the new model.
4. Analysis of the ARM architecture
In this section, we present an analysis of the bare
ARMv7-A architecture [4], i.e., without the upcoming
virtualization extensions, based on our updated model.
An analysis of ARMv7-A including the new extensions
follows in section 5.1. ARM provides four different
instruction sets: 32-bit (fixed-width) ARM, Thumb-2,
Thumb Execution Environment (ThumbEE) and Jazelle.
We will omit ThumbEE and Jazelle from our discussion.
ThumbEE closely resembles Thumb-2, with few additions
and modifications, and is deprecated with the upcom-
ing virtualization extensions. Jazelle’s specification is not
publicly available, and it cannot be combinedwith the up-
coming virtualization extensions [8].
4.1. Machine state
We recall the definition of machine state in our new
model:
S ≡ 〈E,M,P,G,C,A,DM , DP 〉 .
On ARM [4], multiple equally privileged modes exist;
they are system mode (SYS), supervisor mode (SVC), in-
terrupt mode (IRQ), fast interrupt mode (FIQ), abort mode
(ABT) and undefined mode (UND). There is only one un-
privileged mode: user mode (USR). M is always one of
these modes. The mode is altered explicitly by instruc-
tions or implicitly when an event arrives.
There are 16 general purpose registers, labelled R0 to
R15. We exclude R15 from G, because it represents the
program counter P . Some registers are duplicated for dif-
ferent modes; this technique is called banking. Unbanked
registers are shared between all processor modes. The
stack pointer register (R13) and the link register (R14) are
banked for all privileged modes; the instances for SYS are
shared with USR. Furthermore, FIQ has its own banked set
7
of registers R8 to R12. The set G includes all of the men-
tioned registers, excluding P .
The currently active processor state is represented in the
current program status register (CPSR). It stores the pro-
cessor mode M , data memory endianness and interrupt
mask bits among other fields. There are also five saved
program status registers (SPSRs). SPSRs have the same lay-
out: they are copies of the CPSR register used as backup
upon entry of an event handler.
Although it sounds logical to include the entire set of
program status registers into the set of configuration reg-
isters C, there are two pitfalls. Firstly, because the mode
field of the CPSR always reflects the current processor
mode M , including M in C would render the definition
of mode-sensitive instructions (Definition 8) useless. Sec-
ondly, part of theCPSR reflects global state, while the other
part represents a state specific to the currentmodeM . The
latter part is also exposed to usermode. For example, data
memory endianness can be set by software executing in
user mode, andwill only affect user mode. However, data
memory endianness is part of the CPSR.
We clearly cannot add the entire CPSR to the set of
configuration registers C, as it would make our analysis
overly restrictive. Altering the mode-specific state for exe-
cution in one mode from another is always done through
the SPSRs. Hence, we can safely omit any mode-specific
state from the CPSR in C, but we must include the SPSRs
completely.
The set C also contains all of the system control copro-
cessor (CP15) registers. This coprocessor is used for cache,
memory and whole system configuration, in addition to
debugging and monitoring. Its registers can be accessed
usingdedicated coprocessor register read orwrite instruc-
tions.
We also model an event register in the set C. This reg-
ister is part of the mechanism of halting and resuming
instruction execution based on machine specific system
events and should not be confused with our usage of the
term event in this paper for the whole of interrupts and
exceptions.
The ARM architecture provides an MMUwith support
for paged virtual memory6. Hence, address translation
maps can be implemented using page tables.
All IO on ARM is memory-mapped—PMIO is not sup-
ported. Hence, DP is empty. Interactions with coproces-
sors are not seen as IO operations.
4.2. 32-bit ARM instruction behaviour
There are many sensitive instructions on ARM. Below,
we provide a detailed analysis of all sensitive instructions,
grouped by purpose. The results of our analysis are sum-
marized in Table 1.
6The application profile is the only architecture that provides an
MMU. Other ARM architecture profiles provide a simpler memory pro-
tection scheme that does not support page tables.
Table 1: Sensitive and privileged 32-bit ARM instructions
Instruction Control Mode Conf. Priv.
CPS • • ◦ ◦
LDC ◦ • • ◦
LDM (exception return) • • • ◦
LDM (user registers) ◦ • ◦ ◦
MCR • • ◦ ◦
MRC ◦ • • ◦
MRS (SPSR) ◦ • • ◦
MSR • • ◦ ◦
RFE • • ◦ ◦
SEV • ◦ ◦ ◦
SRS ◦ • • ◦
STC • • ◦ ◦
STM (user registers) ◦ • ◦ ◦
SVC • ◦ ◦ •
SUBS, ... (exception return) • • • ◦
WFE • ◦ • ◦
WFI • ◦ • ◦
4.2.1. Coprocessor instructions
The ARM instruction set contains a number of instruc-
tions to interact with coprocessors. In this paper, we limit
our discussion to a basic implementation of the ARM ar-
chitecture without extensions. Hence, only two coproces-
sors are available in the system: CP14, which is used for
debugging and tracing, and CP15, the system control pro-
cessor. All their registers are part of the set C.
Out of all instructions there are only four that can oper-
ate on these coprocessors:
• MCR and STC are control-sensitive because they write
data to a coprocessor’s registers or memory;
• LDC and MRC are configuration-sensitive because they
load data from a coprocessor’s registers or memory.
A coprocessor may also deny access from USR mode.
Hence, all of the above instructions are also mode-
sensitive.
In systems with extra coprocessors, these coprocessors
have to be carefully analysed to determine whether any of
their registers belong to C. The specification of the copro-
cessor also determines the set of instructions that can op-
erate on it. This set may not be limited to the instructions
discussed above; it may also include any of the follow-
ing: CDP, CDP2, LDCL, LDC2, LDC2L, MCR2, MCRR, MCRR2, MRC2,
MRRC, MRRC2, STCL, STC2 and STC2L. The effect of each of
these instructions, including the ones mentioned above,
must be studied for each coprocessor individually to de-
termine which instructions are sensitive.
4.2.2. Event handling
ARM provides a plethora of instructions for handling
events:
• LDM, SUBS, MOVS, MVNS, ADCS, ADDS, ANDS, BICS, EORS,
ORRS, RSBS, RSCS and SBCS can all be used to return
8
from event handlers. Since this updates the processor
mode by definition, all these instructions are control-
sensitive. These instructions are also configuration-
sensitive, because they copy the SPSR into the CPSR.
• Another variant of LDM can be used to load values into
USR mode registers from memory. A similar form of
STM provides the inverse operation.
• RFE is control-sensitive because it loads values into
the program counter (R15) and the CPSR from mem-
ory.
• SRS is configuration-sensitive because it stores the
value of the link register (R14) and the current SPSR
to memory.
All of the above instructions are also mode-sensitive,
because their behaviour in USR mode is unspecified.
4.2.3. Direct modification of system registers
Some instructions directly read or write to system reg-
isters such as the CPSR or one of its banked versions:
• MRS reads from the SPSR, hence it is both
configuration-sensitive and mode-sensitive.
• CPS and MSR write to system registers. The former
acts as a NOP while the latter is unpredictable when
executed in USR mode. Hence, they are both control-
sensitive and mode-sensitive.
• SVC is used as system call (or software interrupt) by
applications to call the operating system. It uncon-
ditionally changes the current mode to supervisor
mode (SVC). Hence, it is control-sensitive.
4.2.4. Sleep and wake up
In a multiprocessor system, software executing on dif-
ferent processors can communicate using events. The
architecture provides two instructions for this purpose:
send event (SEV) and wait for event (WFE). Software can
also hint the processor that it is waiting for an interrupt or
external event through the WFI instruction. Whilewaiting,
the processor may go to a low-power state. After an ex-
ternal event or interrupt occurs, the one-bit abstract event
register will be asserted. The processor wakes up and this
bit is cleared. Since the abstract event register is part of the
system state, the SEV, WFE and WFI instructions are control
sensitive.
4.3. Thumb-2 instruction behaviour
The Thumb-2 instruction set provides more or less the
same instructions as the 32-bitARM instruction set. More-
over, the set of sensitive Thumb-2 instructions is a proper
subset of the set of sensitive ARM instructions, so no elab-
orate discussion is required. The results of our analysis
are summarized in Table 2.
Table 2: Sensitive and privileged Thumb-2 instructions
Instruction Control Mode Conf. Priv.
CPS • • ◦ ◦
LDC ◦ • • ◦
MCR • • ◦ ◦
MRC ◦ • • ◦
MRS (SPSR) ◦ • • ◦
MSR • • ◦ ◦
RFE • • ◦ ◦
SEV • ◦ ◦ ◦
SRS ◦ • • ◦
STC • • ◦ ◦
SVC • ◦ ◦ •
SUBS (exception return) • • • ◦
WFE • ◦ • ◦
WFI • ◦ • ◦
4.4. Conclusion
Investigating the set of sensitive instructions in the
ARM and Thumb-2 instruction sets in tables 1 and 2, it is
clear that both instruction sets contain sensitive instruc-
tions that are not privileged. Based on our findings in
Section 3.5, we conclude that the ARM architecture is not
classically virtualizable.
5. Full virtualization in practice
The criteria for virtualizability introduced byPopek and
Goldberg [44] and extended in this paper are sufficient but
not necessary to construct an efficient VMM for a partic-
ular architecture. Advances in the construction of VMMs
have enabled full virtualization on architectures that fail
these criteria. However, pure software solutions using bi-
nary rewriting or emulation are typically deemed to in-
troduce too much overhead or to be too complex.
Hardware vendors have also adapted to the need for
virtualization support, and architectures that were for-
merly not classically virtualizable such as x86 have al-
ready been made virtualizable [2, 50]. ARM is follow-
ing the same path with its upcoming virtualization and
large physical address (LPA) extensions forARMv7-A and
ARMv7-R [5, 6, 7, 8].
In this section, we discuss and compare both software
and hardware approaches. For the latter, we analyse
ARM’s upcoming extensions.
5.1. Hardware support for full virtualization
The upcoming virtualization and LPA extensions intro-
duce a newprocessormode,HYP, and a number of new in-
structions. They also impact the behaviour of many exist-
ing instructions, andmodify themechanisms for interrupt
handling, memory management and performance moni-
toring.
The new HYP mode is more privileged than the origi-
nal set of privileged modes (SVC, SYS, ABT, UND, IRQ, FIQ);
the latter is now referred to as the set of kernel modes.
9
HYP mode enables a VMM to run below the operating
system level without forcing the operating system kernel
to run unprivileged. Instead, the operating system ker-
nel can use all kernel modes transparently, as if no VMM
was present. Events for the VMM are handled in the HYP
mode, instead of the traditional exception modes (ABT, UND,
IRQ, FIQ), which are a subset of the kernel modes. This sig-
nificantly reduces the set of sensitive instructions.
In order to make all sensitive instructions executed in
any of the kernel modes trap to HYP mode, the virtualiza-
tion extensions add configurable traps to the architecture. A
VMMcan then make certain instructions trap as required.
When running just one operating systemwithout a VMM,
the traps will be disabled.
A quick analysis of the original set of sensitive instruc-
tions confirms the implications stated above:
• Coprocessor instructions can be configured to trap.
These traps provide coarse-grained control over
accesses to coprocessors CP0 to CP13. There
are more fine-grained controls for the remaining
coprocessors—the debug and execution environment
support coprocessor (CP14) and the system control
coprocessor (CP15).
• Memory access and event handler return instructions
are no longer sensitive, since a guest running on a
VMM with hardware extensions can use the excep-
tionmodes in the sameway as for the non-virtualized
case.
• Instructions that directlymodify system state now act
on a guest’s state, rather than on the entire system
state. Hence, they are no longer sensitive.
• Instructions that deal with external events are
adapted to work with a per-guest interrupt state.
Each guest has its own virtual interrupts. Both WFE
and WFI have independent configurable traps. These
traps can be used as hints by the VMM to schedule
other guests. However, the SEV instruction cannot be
made to trap.
It may seem surprising that SEV cannot be made to trap.
It is the only sensitive instruction that remains unprivi-
leged with the new hardware extensions. However, none
of the sleep and wake-up instructions can cause function-
ally incorrect behaviour of theVMM[4]. The configurable
traps for WFE and WFI are provided so that a VMM is able
to detect when a guest is idle. The SEV instruction cannot
provide such useful information to the VMM.
Hence, a VMM can be constructed that executes VMs
until they trap to HYP mode. All other additions by the
virtualization and LPA extensions are not strictly neces-
sary but speed up common VMM tasks and reduce the
code size of the VMM. For example, LPA adds a second
stage to the address translation mechanism in the MMU.
This mechanism makes the hardware capable of combin-
ing a translation table for the guest with a translation ta-
ble for the VMM, eliminating the need for software-based
shadow maps.
5.2. Dynamic binary translation
Full virtualization can also be supported without mod-
ifications to the hardware; this is typically achieved
through dynamic binary translation (DBT), sometimes
also called software dynamic translation. The concept
of a DBT VMM evolved from the theory of Popek and
Goldberg’s hybrid VMM. An analysis according to our
model remains useful, because it can be used to determine
whether an architecture is suitable for the construction of
a DBT VMM. Furthermore, the analysis will reveal which
instructions will need to be rewritten.
In a hybrid VMM, all instructions normally executed in
a privileged mode are interpreted by the VMM. Instruc-
tions normally executed in unprivileged mode are exe-
cuted natively [44]. In a traditional software stack, a hy-
brid VMM would interpret the OS but execute applica-
tions natively. Popek and Goldberg [44] proved that a hy-
brid VMM can be constructed if all user-sensitive instruc-
tions are also privileged. In our model, user-sensitive in-
structions are defined as follows:
Definition 11. An instruction i is user-sensitive if there
exists a state S〈e,mU , p, g, c, a, dM , dP 〉 for which i(S) is
control-sensitive and/or configuration-sensitive.
A DBT VMM can achieve a performance benefit over
interpretation by rewriting instruction sequences at run
time [13]. The basic idea is to execute as many rewritten
instructions as possible natively, without intervention of
the VMM. Sensitive instructions must be rewritten such
that they trap to the interpreter of the VMM. Because the
VMM needs to keep track of the execution of its VMs, it
will also need to rewrite control flow instructions.
5.2.1. DBT for the ARM architecture
DBT has been used successfully to virtualize the Intel
x86 architecture by VMware, Microsoft, QEMU, Virtual-
Box and others [16, 30, 52, 53]. The earliest DBT VMM
dates from 1999, several years before the emergence of
hardware extensions for virtualization in 2005 [2, 50].
Early x86 virtualization solutions used DBT, as paravir-
tualization was yet to be invented. Since virtualization for
embedded systems only became a topic of interest much
later, all efforts on virtualizing the ARM architecture were
geared towards paravirtualization. At the time of writing,
to the best of our knowledge no VMMwith full virtualiza-
tion using DBT has been constructed for the ARM archi-
tecture.
In order to construct a DBT VMM for ARM, it is re-
quired to determine how much of the guest code must be
rewritten: if all user-sensitive instructions are also priv-
ileged, only guest privileged code must be translated,
10
otherwise, all code must be translated. We can extend
our analysis from Section 4 to analyse user-sensitivity.
As it turns out, there are four user-sensitive instructions,
shared by both ARM and Thumb:
• the supervisor call instruction SVC, because it always
changes the processor mode to SVC;
• the sleep and wake-up instructions SEV, WFE and WFI.
All other sensitive instructions either act as NOP or exhibit
innocuous behaviour when executed in USR mode. Of all
user-sensitive instructions, only SVC is privileged. Hence,
ARM does not meet Popek and Goldberg’s conditions for
a hybrid VMM either.
In practice, the sleep and wake-up instructions are
merely hints for a processor. Even amasked interrupt will
wake up a sleeping processor. Furthermore, if any inter-
rupts are pending, the processor will not sleep either [4].
As such, not intercepting these operations cannot lead to
resource control violations. In other words, we can ignore
them and still obtain functional correctness. When we ef-
fectively do ignore the presence of unprivileged sleep and
wake-up instructions, we can construct a DBT VMM that
only translates privileged guest code. The downside of
this approach is that the VMM cannot intercept sleep in-
structions in unprivileged guest code, which could other-
wise be used by the VM to inform the VMM that the VM is
idle. The same problem applies to paravirtualisation so-
lutions forARM: because only guest privileged code (such
as OS kernels) is altered, they have to make the same as-
sumption. They hence suffer from the same deficiency.
UsingDBT also requires theVMMto keep track of guest
control flow. On ARM, the program counter is explicitly
visible as R15 and can be altered by several instructions,
including ALU instructions. This complicates the design
of an instruction decoder for DBT.
Furthermore, the program counter can be used as
source operand for even more instructions [4, 26, 43]. Be-
cause the width of instructions is fixed at 16 or 32 bits,
absolute address operands cannot be encoded. There-
fore, distributed literal pools—pools of data embedded in
code—are used which are accessed using PC-relative ad-
dressing. Other architectures offer instructions that can
encode absolute addresses in full, such as on x86, or use
a single global offset table instead of distributed embedded
data pools, such as on Alpha [39].
We call instructions that depend on the value of the pro-
gram counter virtual-location-sensitive. Formally, they are
defined as follows:
Definition 12. An instruction i is virtual-location-
sensitive if, given two states S1〈e,m, p, g, c, a, dM , dP 〉
and S2〈e,m, p + r, g, c, a, dM , dP 〉 such that for some off-
set r ∈ Z∗, and i(S1) and i(S2) do not memory trap,
for i(S1) = 〈e1,m1, p1, g1, c1, a1, dM1 , dP1 〉 and i(S2) =
〈e2,m2, p2, g2, c2, a2, dM2 , dP2 〉we have:
e1 6= e2 ∨m1 6= m2 ∨ p1 6= (p2 − r) ∨ g1 6= g2
∨ c1 6= c2 ∨ a1 6= a2 ∨ dM1 6= dM2 ∨ dP1 6= dP2 .
Virtual-location-sensitive instructions cannot be relo-
cated in the virtual address space; as opposed to Popek
and Goldberg’s location-sensitive instructions which can-
not be relocated in the physical address space. Virtual-
location-sensitive instructions are innocuous to execute-
to-trap VMMs if requirement (1) on the address map is
met (see Section 3.2).
Virtual-location-sensitive instructions wreak havoc
with DBT VMMs because privileged guest code cannot
be executed in place, as opposed to unprivileged guest
code. In-place translation and execution is impossible
for several reasons: firstly, a VMM cannot prevent a
guest from observing the alterations to its code, and
secondly, the size of the translated code will not nec-
essarily match the size of the original code. Instead,
code is translated to a cache located elsewhere in the
virtual address space. The program counter observed
by the translated code will be incorrect unless the VMM
also intercepts virtual-location-sensitive instructions.
Optimization techniques can be used to rewrite virtual-
location-sensitive instructions to equivalent non-sensitive
instruction sequences, instead of merely replacing them
by a trapping instruction [43].
5.2.2. Comparing DBT with hardware extensions
One of the common arguments against the use of DBT
in a VMM is that it introduces substantial complexity. In
reality, hardware extensions introduce similar complexity,
but on the hardware level. Although theymake it possible
to construct a smaller VMM, thereby decreasing the po-
tential for bugs and vulnerabilities, the design complexity
of hardware extensions is illustrated by the fact that early
implementations were outperformed by software [1].
An argument that speaks in favour of DBT is its versa-
tility. Virtualization is often considered as a solution to
software portability issues across different architectures.
Hardware extensions merely recreate this problem at a
different level. On the contrary, DBT is more versatile
since it transparently enables:
• Optimizations across the border between OS and
applications: DBT can be used to optimize and re-
duce context switching between an OS and its appli-
cations under a VMM, as many operations become
redundant when executed on virtual hardware [1].
• Legacy emulation and optimization: DBT can be
used as a glue layer to emulate legacy hardware plat-
forms and to resolve backward incompatibility be-
tween generations of architectures [18]. Such emu-
lation is useful as today’s hardware has becomemore
flexible than software [21, 27]: system-on-chips have
11
an average lifetime of 4 years, while software stacks
often have to last much longer. Because of imple-
mentation differences between system-on-chips and
evolutions in hardware architecture, software must
be continuously rewritten and recertified for newer
platforms.
• Full system instrumentation: entire software stacks
may be instrumented from a VMM similar to the ap-
proach used in PinOS [17].
• Load balancing in heterogeneous multi-core sys-
tems: DBT can be used to translate code from one
core to another if cores have different ISAs [33,
55]. If cores share the same ISA, such as in ARM’s
big.LITTLE [9], hardware extensions remain a feasi-
ble approach to virtualization. But even in this case,
DBT can potentially improve performance by dynam-
ically optimizing code for themicro-architectural fea-
tures of the target architecture, such as a static branch
predictor in little cores.
6. Conclusions
Despite the popularity of paravirtualization in today’s
embedded systems, full virtualization remains important.
The theory introduced by Popek and Goldberg is still use-
ful for determining whether an architecture is suitable for
the construction of efficient VMMs for full virtualization,
which are based on the execute-to-trap principle. How-
ever, their model does not take into account features of-
ten found in modern architectures. Therefore we have ex-
tended their model with paged virtual memory by intro-
ducing the concept of an address map, and we derived a
new formal constraint for the correctness of such maps.
We also studied the effect of IO and events, and updated
the model, definitions and results accordingly.
Our model can be applied to analyse modern computer
architectures. Wehave demonstrated this in a formal anal-
ysis of ARMv7-A, which proved to be not classically vir-
tualizable. Nevertheless, modern techniques in the con-
struction of VMMs such as DBT can enable full virtu-
alization on ARM. DBT does not require neither hard-
ware changes nor guest modification and provides a so-
lution to the lack of classic virtualizability on architec-
tures such as ARM. In other architectures, DBT has al-
ready been shown to be able to match or even outperform
the hardware assisted virtualization approach. However,
it comes at a price of software implementation complexity
and increasedmemory footprint. On the other hand, DBT
technology provides more further development possibili-
ties like cross architecture virtualization, legacy software
stack support and better heterogeneous multicore system
utilisation.
In the mean time, industry has started to adapt to the
interest in full virtualization. Hardware virtualization ex-
tensions attempt tomakeARMcompatiblewith the Popek
andGoldberg classic virtualization requirements, and im-
plement certain parts of the VMM functionality in hard-
ware for efficiency reasons sacrificing portability. Utilis-
ing the hardware functionality can reduce VMM design
complexity at the cost of moving the complexity to the de-
sign of the hardware layer.
Performance measurements of fully virtualized ARM
systems cannot yet be collected, as themarket is still await-
ing the first devices to implement the upcoming virtual-
ization extensions, and we are not aware of any software
solutions at the time of writing. However, we have shown
that it is important to evaluate all methods of virtualiza-
tion not only on grounds of performance, but also func-
tionality, versatility and costs.
7. Future work
Although our model can be used to determine whether
modern computer architectures are suitable for the con-
struction of efficient execute-to-trap VMMs, it does not
lead to any conclusions on whether or not an architecture
can support more than one guest. For example, routing
of asynchronous events to VMM and guests may prove to
be difficult if such events have priorities, as VMMs have
to make sure lower-priority guests cannot block higher-
priority guests.
Another issue that remains untouched by our model is
multi-threading of privileged code, whichmay happen on
multi-core and multi-threading architectures.
We are currently working on a full virtualization solu-
tion for ARMbased onDBT,whichwill serve as a research
platform for the opportunities discussed in Section 5.2.
References
[1] K. Adams, O. Agesen, A comparison of software and hardware
techniques for x86 virtualization, in: Proceedings of the 12th in-
ternational conference on Architectural Support for Programming
Languages and Operating Systems, ASPLOS-XII, ACM, New York,
NY, USA, 2006, pp. 2–13.
[2] AMD, AMD64 Virtualization Codenamed “Pacifica” Technology –
Secure Virtual Machine Reference Manual, 3.01 edition, 2005.
[3] Z. Amsden, D. Arai, D. Hecht, A. Holler, P. Subrahmanyam, VMI:
an interface for paravirtualization, in: Proceedings of the Linux
Symposium, volume 2 of 2006 Linux Symposium, pp. 371–386.
[4] ARM Architecture Group, ARM® Architecture Reference Manual:
ARM®v7-A and ARM®v7-R edition – errata markup, ARM Lim-
ited, ARM DDI 0406B_ errata_ 2010_ Q3 (ID100710) edition, 2010.
[5] ARM Architecture Group, ARM® Generic Timer Specification,
ARM Limited, PRD03-GENC-009660 9.0x edition, 2010.
[6] ARM Architecture Group, ARM® Performance Monitoring Archi-
tecture version 2 Virtualization Extensions, ARM Limited, DSA09-
PRDC-010447 6.0x edition, 2010.
[7] ARM Architecture Group, Large Physical Address Extensions
Specification, ARM Limited, PRD03-GENC-008469 15.0x edition,
2010.
[8] ARM Architecture Group, Virtualization Extensions Architecture
Specification, ARM Limited, PRD03-GENC-008353 14.0x edition,
2010.




[10] F.Armand, J. Berniolles, J.L. Lawall, G.Muller, Automating the port
of Linux to the VirtualLogix hypervisor using semantic patches,
in: 4th European Congress ERTS EMBEDDED REAL TIME SOFT-
WARE, ERTS 2008, pp. 1–7.
[11] F. Armand, M. Gien, A practical look at micro-kernels and virtual
machine monitors, in: 6th IEEE Consumer Communications and
Networking Conference, CCNC 2009, IEEE, Piscataway, New Jer-
sey, USA, 2009, pp. 1–7.
[12] B Labs Ltd., Codezero project overview, http://www.l4dev.org/
codezero_overview, 2010.
[13] V. Bala, E. Duesterwald, S. Banerjia, Transparent dynamic opti-
mization: the design and implementation of Dynamo, Technical
Report, Hewlett Packard Laboratories Technical Report HPL-1999-
78, 1999.
[14] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho,
R. Neugebauer, I. Pratt, A. Warfield, Xen and the art of virtualiza-
tion, SIGOPS Operating Systems Review 37 (2003) 164–177.
[15] K. Barr, P. Bungale, S. Deasy, V. Gyuris, P. Hung, C.Newell, H. Tuch,
B. Zoppis, The VMware mobile virtualization platform: is that a
hypervisor in your pocket?, SIGOPS Operating Systems Review 44
(2010) 124–135.
[16] F. Bellard, QEMU, a fast and portable dynamic translator, in:
Proceedings of the USENIX 2005 Annual Technical Conference,
FREENIX Track, USENIX ’05, USENIX Association, Berkeley, CA,
USA, 2005, pp. 41–46.
[17] P.P. Bungale, C.K. Luk, PinOS: a programmable framework for
whole-system dynamic instrumentation, in: Proceedings of the 3rd
international conference on Virtual Execution Environments, VEE
’07, ACM, New York, NY, USA, 2007, pp. 137–147.
[18] C. Cifuentes, V. Malhotra, Binary translation: static, dynamic, re-
targetable?, in: Proceedings of the 1996 International Conference
on Software Maintenance, ICSM ’96, IEEE, 1996, pp. 340–349.
[19] CORDIS, Report on the EC Workshop on Virtualisation,




[20] H. Dong, Q. Hao, Extension to the model of a virtualizable com-
puter and analysis on the efficiency of a virtualmachine, in: Second
International Conference on Computer Modeling and Simulation,
volume 2 of ICCMS 2010, IEEE Computer Society, Los Alamitos,
California, USA, 2010, pp. 503–507.
[21] M. Duranton, S. Yehia, B. De Sutter, K. De Bosschere, A. Co-
hen, B. Falsafi, G. Gaydadjiev, M. Katevenis, J. Maebe, H. Munk,
N. Navarro, A. Ramirez, O. Temam,M. Valero, The HiPEAC vision,
2010.
[22] D.R. Ferstay, Fast Secure Virtualization for the ARMPlatform,Mas-
ter’s thesis, TheUniversity of BritishColumbia, Faculty ofGraduate
Studies (Computer Science), 2006.
[23] S. Furber, ARM system-on-chip architecture, ARM system-on-chip
architecture, Addison-Wesley, Boston, Massachusetts, USA, 2000,
second edition, p. 39.
[24] J. Gallard, A. Lèbre, G. Vallée, C. Morin, P. Gallard, S.L. Scott, Re-
finement proposal of the Goldberg’s theory, in: Proceedings of the
9th International Conference on Algorithms and Architectures for
Parallel Processing, ICA3PP ’09, Springer-Verlag, Berlin - Heidel-
berg, Germany, 2009, pp. 853–865.
[25] R.P. Goldberg, Architecture of virtual machines, in: Proceedings of
the workshop on virtual computer systems, ACM, New York, NY,
USA, 1973, pp. 74–112.
[26] K. Hazelwood, A. Klauser, A dynamic binary instrumentation en-
gine for the ARM architecture, in: Proceedings of the 2006 inter-
national conference on Compilers, architecture and synthesis for
embedded systems, CASES ’06, ACM, New York, NY, USA, 2006,
pp. 261–270.
[27] T. Heinz, R. Wilhelm, Towards device emulation code generation,
in: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference
on Languages, compilers, and tools for embedded systems, LCTES
’09, ACM, New York, NY, USA, 2009, pp. 109–118.
[28] G. Heiser, The role of virtualization in embedded systems, in: Pro-
ceedings of the 1st workshop on Isolation and Integration in Em-
bedded Systems, IIES ’08, ACM, New York, NY, USA, 2008, pp. 11–
16.
[29] G. Heiser, The Motorola Evoke QA4—A Case Study in Mobile Vir-
tualization, Technology White Paper, Open Kernel Labs, 2009.
[30] J. Honeycutt, Microsoft Virtual PC 2004 Technical Overview, 2003.
[31] J.Y. Hwang, S.B. Suh, S.K. Heo, C.J. Park, J.M. Ryu, S.Y. Park, C.R.
Kim, Xen on ARM: System virtualization using Xen hypervisor for
ARM-based secure mobile phones, in: 5th IEEE Consumer Com-
munications and Networking Conference, CCNC 2008, IEEE, Pis-
cataway, New Jersey, USA, 2008, pp. 257–261.
[32] IBM, PowerPC® Microprocessor Family: The Programming Envi-
ronments Manual for 64-bit Microprocessors, International Busi-
ness Machines Corporation, 3.0 edition, 2005.
[33] IBM, PowerVM Lx86 for x86 Linux applications, 2011.
http://www.ibm.com/developerworks/linux/lx86/index.html.
[34] H. Inoue, A. Ikeno, M. Kondo, J. Sakai, M. Edahiro, VIRTUS:
a new processor virtualization architecture for security-oriented
next-generation mobile terminals, in: Proceedings of the 43rd an-
nual Design Automation Conference, DAC ’06, ACM, New York,
NY, USA, 2006, pp. 484–489.
[35] Intel, Intel® 64 and IA-32 Architectures Software Developer’s Man-
ual, Volume 2: Instruction Set Reference, A-Z, Intel Corporation,
2011.
[36] Intel, Intel® 64 and IA-32 Architectures Software Developer’s Man-
ual, Volume 3: System Programming Guide, Intel Corporation,
2011.
[37] S.M. Lee, S.B. Suh, J.D. Choi, Fine-grained I/O access control based
on Xen virtualization for 3G/4Gmobile devices, in: Proceedings of
the 47thDesignAutomationConference, DAC ’10, ACM,NewYork,
NY, USA, 2010, pp. 108–113.
[38] J. LeVasseur, V. Uhlig, Y. Yang, M. Chapman, P. Chubb, B. Leslie,
G. Heiser, Pre-virtualization: soft layering for virtual machines, in:
13th Asia-Pacific Computer Systems Architecture Conference, AC-
SAC 2008, IEEE Computer Society, Los Alamitos, California, USA,
2008, pp. 1–9.
[39] J.R. Levine, Linkers and Loaders, Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, 1st edition, 1999.
[40] MIPS, MIPS® Architecture For Programmers Volume II-A: The
MIPS32® Instruction Set, MIPS Technologies, Inc., MD00086, 3.02
edition, 2011.
[41] MIPS, MIPS® Architecture For Programmers Volume II-A: The
MIPS64® Instruction Set, MIPS Technologies, Inc., MD00087, 3.02
edition, 2011.
[42] MIPS, MIPS® Architecture for Programmers Volume II-B: The mi-
croMIPS32™ Instruction Set, MIPS Technologies, Inc., MD00582,
3.05 edition, 2011.
[43] R.W. Moore, J.A. Baiocchi, B.R. Childers, J.W. Davidson, J.D. Hiser,
Addressing the challenges of DBT for the ARM architecture, in:
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on
Languages, compilers, and tools for embedded systems, LCTES ’09,
ACM, New York, NY, USA, 2009, pp. 147–156.
[44] G.J. Popek, R.P. Goldberg, Formal requirements for virtualizable
third generation architectures, Communications of the ACM 17
(1974) 412–421.
[45] M. Rosenblum, VMware’s virtual platform: a virtualmachinemon-
itor for commodity PCs, in: Hot Chips 11 (1999).
[46] R. Russell, virtio: towards a de-facto standard for virtual I/O de-
vices, SIGOPS Operating Systems Review 42 (2008) 95–103.
[47] B. Smith, ARM and Intel battle over the mobile chip’s future, Com-
puter 41 (2008) 15–18.
[48] J. Smith, R. Nair, Virtual Machines: Versatile Platforms for Systems
and Processes, The Morgan Kaufmann Series in Computer Archi-
tecture and Design, Morgan Kaufmann Publishers Inc., San Fran-
cisco, California, USA, 2005.
[49] SuperH, SuperH™ (SH) 64-Bit RISC Series: SH-5 CPU Core, Vol-
ume 1: Architecture, SuperH, Inc., 05-cc-10001, v1.0 edition, 2002.
[50] R. Uhlig, G. Neiger, D. Rodgers, A. Santoni, F. Martins, A. Ander-
son, S. Bennett, A.Kagi, F. Leung, L. Smith, Intel virtualization tech-
nology, Computer 38 (2005) 48–56.
[51] P. Varanasi, G. Heiser, Hardware-supported virtualization on
ARM, in: Proceedings of the 2nd ACM SIGOPS Asia-Pacific Work-
13
shop on Systems, APSys 2011, ACM, New York, NY, USA, 2011, pp.
11:1–11:5.
[52] VMware, Understanding full virtualization, paravirtualization,
and hardware assist, 2007. White paper.
[53] J. Watson, VirtualBox: bits and bytes masquerading as machines,
Linux Journal 2008 (2008).
[54] A. Whitaker, M. Shaw, S.D. Gribble, Denali: Lightweight Virtual
Machines for Distributed and Networked Applications, Technical
Report 02-02-01, University of Washington, Seattle, Washington,
USA, 2002.
[55] Y. Wu, S. Hu, E. Borin, C. Wang, A HW/SW co-designed heteroge-
neous multi-core virtual machine for energy-efficient general pur-
pose computing, in: 9th Annual IEEE/ACM International Sympo-
siumonCodeGeneration andOptimization, CGO2011, IEEE, IEEE
Computer Society, Piscataway, New Jersey, USA, 2011, pp. 236–245.
14
