Abstract. The isolation of security critical components from an untrusted OS allows to both protect applications and to harden the OS itself, for instance by run-time monitoring. Virtualization of the memory subsystem is a key component to provide such isolation. We present the design, implementation and verification of a virtualization platform for the ARMv7-A processor family. Our design is based on direct paging, an MMU virtualization mechanism previously introduced by Xen for the x86 architecture, and used later with minor variants by the Secure Virtual Architecture, SVA. We show that the direct paging mechanism can be implemented using a compact design, suitable for formal verification down to a low level of abstraction, without penalizing system performance. The verification is performed using the HOL4 theorem prover and uses a detailed model of the ARMv7-A ISA, including the MMU. We prove memory isolation of the hosted components along with information flow security for an abstract top level model of the virtualization mechanism. The abstract model is refined down to a HOL4 transition system closely resembling a C implementation. The virtualization mechanism is demonstrated on real hardware via a hypervisor capable of hosting Linux as an untrusted guest.
 1 
Introduction
A basic security requirement for systems that allow software to execute at different levels of security is memory isolation: The ability to store secret information within a designated part of memory and prevent the contents of this memory to be affected by, or leaked to, parts of the system that are not authorized to access it. Without the usage of special hardware, trustworthy memory isolation relies on the correct implementation of the OS kernel. However, given the size and complexity of modern OSs, the vision of comprehensive and formal commodity OS verification is as distant as ever.
An alternative to verifying the entire OS is to delegate critical functionality to special low-level execution platforms such as hypervisors, separation kernels, or microkernels. Such an approach has some significant advantages. First, the size and complexity of the execution platform can be made much smaller, potentially opening up for rigorous verification. The literature has many recent examples of this, in seL4 [16] , Microsoft's Hyper-V project [17] , Green Hills' CC certified INTEGRITY-178B separation kernel [22] , and the PROSPER separation kernel [10] . Second, the platform can be opened up to public scrutiny and certification, independent of application stacks. Virtualization-like mechanisms can also be used to support various forms of application hardening against untrusted OSs. Examples of this include KCoFi [7] based on the Secure Virtual Architecture [9] , Overshadow [5] , Inktag [14] , and Virtual Ghost [8] . All these cases rely crucially on memory isolation to provide the required security guarantees, typically by virtualizing the memory management unit (MMU) hardware. MMU virtualization, however, can be exceedingly tricky to get right, motivating the use of formal methods for its verification.
In this paper we present an MMU virtualization API for the ARMv7 family of processors (which is one of the widely adopted architectures for embedded devices) that has been formally verified down to a low level of abstraction. The API uses direct paging, a virtualization mechanism introduced by Xen [4] and used later with some variations by the Secure Virtual Architecture [9] . In direct paging, page tables are kept in guest memory and allowed to be read and directly manipulated by the untrusted guest OS (when they are not in active use by the MMU). Xen demonstrated that this approach has better performance than other software virtualization approaches (e.g. shadow page tables) on the x86 architecture [4] . Moreover, since direct paging does not require shadow data structures, this approach has small memory overhead. The engineering challenge we posed ourselves was to design a minimal API that is (i) sufficiently expressive to host a paravirtualized Linux, (ii) introduces an acceptable overhead and (iii) whose implementation is sufficiently small to be subject to pervasive verification for commodity CPU architecture such as ARMv7.
The security objective is to allow a malicious guest system to operate freely, invoking the hypervisor at will, without being able to access memory or processor resources that the guest has not received static permission for. The verification is performed using a formal model of the ARMv7 architecture [11] , implemented in the HOL4 interactive theorem prover.
The verification is built on top a model, the top level specification (TLS), which describes the ideal behavior of hypervisor's handlers implementing the virtualization mechanism, alternating with user mode execution under control of a possibly malicious guest. Parts of the security state is stored in a model state, by construction outside the reach of the guest. However, page tables are stored in memory. This is a key complication forced by the direct paging approach, and the solution to this problem is a key contribution of the paper. The upshot is that it is no longer self-evident that the desired memory isolation properties, non-exfiltration and non-infiltration in the terminology of [13] , hold for the TLS, and an important part of the verification is therefore to formally validate this fact.
To keep the TLS as simple and abstract as possible, the TLS addresses page tables directly using their physical addresses. A real implementation cannot do this, but must appeal to virtual addresses instead, in addition to managing its internal data structures. To this end we introduce an implementation model, essentially a state transition model operating on the real ARMv7-A state through transitions that directly reflect handler execution at the binary level. We exhibit a refinement from the TLS to the implementation model, prove its correctness, and show, as a corollary, that the memory isolation properties proved at top level transfer to the implementation level.
The verification highlighted three classes of bugs in the initial design of the virtualization mechanism:
(i) Arithmetic overflows, bit field and offset mismatches, and signed operators where the unsigned ones were needed. (ii) Missing checks of self referencing page tables. (iii) Approval of guest requests that cause unpredictable behaviors of the ARMv7 MMU.
Moreover, the verification of the implementation model identified additional bugs exploitable by requesting the validation of physical blocks residing outside the guest memory. This last class of bugs was identified because the implementation model takes into account the virtual memory mapping used by the handlers. We report on a port of Linux kernel 2.6.34 and demonstrate the prototype implementation of a hypervisor for which the core component is the verified MMU virtualization API. Experiments demonstrate that the hypervisor can run with reasonable performance on real hardware (Beagleboard-xM based on the Cortex-A8 CPU).
Related Work
The ability to isolate security critical components from an untrusted OS allows non critical parts of a system to be implemented while the critical software remains adequately protected. This isolation can be used both to protect applications from an untrusted OS as well as to protect the OS itself from internal threats. For example, KCoFI [7] uses Secure Virtual Architecture [9] to isolate the OS from a run-time checker. The checker instruments the OS and monitors its activities to guarantee the control-flow integrity of the OS itself. Related examples are application hardening frameworks such as Overshadow [5] , Inktag [14] , and virtual ghost [8] . In all these cases some form of virtualization of the MMU hardware is a critical component to provide the required isolation guarantees.
Shadow page tables (SPT) is a common approach to MMU virtualization. The virtualization layer maintains a shadow copy of page tables created and maintained by the guest OS. The MMU uses only the shadow pages, which are updated after the virtualization layer validates the OS changes. The Hyper-V hypervisor, which uses shadow pages on x86, has been formally verified using the semi automated VCC tool [17] . Related work [3, 21] uses shadow page tables to provide full virtualization, including virtual memory, for "baby VAMP", a simplified MIPS, using VCC. This work, along with later work on TLB virtualization for an abstract mode of x64 [2] , has been verified using Wolfgang Paul's VCC-based simulation framework. Also, the OKL4-microvisor uses shadow paging to virtualize the memory subsystem [12] . However, this hypervisor has not been verified.
Some modern CPUs provide native hardware support for virtualization. The ARM Virtualization Extensions augment the CPU with a complete new execution mode and provide a two stage address translation. Using this mechanism, the MMU virtualization does not need to be implemented in software. Even though such hardware support can significantly reduce the complexity of the virtualization layer [24] , it does not make software based solutions obsolete. For example, the recent Cortex-A5 (used in feature-phones) and the legacy ARM11 cores (used in the 2014 "New Nintendo 3DS") do not make use of such extensions. Today, the IoT and wearable computing are dominated by microcontrollers (e.g. Cortex-M). As the recent Intel Quark demonstrates, the necessity of executing legacy stacks (e.g. Linux) is pushing towards equipping these microcontrollers with a MMU. Quark and the upcoming ARMv8-R both support an MMU and lack two stage page-tables. Furthermore, solutions based on FPGAs and softcores (e.g. LEON) can benefit from software based virtualization since the gates that are not used for virtualization extensions can be used to implement the application specific logic (e.g. digital signal processing, software-defined radio, cryptography). Our contributions. We present the first trustworthy virtualization mechanism based on "direct paging", an approach inspired by the paravirtualization mechanism of Xen [4] . The design of the platform is sufficiently slim to enable its formal verification without penalizing the system performance. The verification is done down to a detailed model of the architecture, including a detailed model of the ARMv7 MMU. This enable our threat model to consist of an arbitrary guest that can execute any ARMv7 instruction in user mode. We prove complete mediation of the MMU configurations, memory isolation of the hosted components, and information flow correctness. We demonstrate the platform via a prototype hypervisor that is capable of hosting a Linux system while provably isolating it from other services.
The Memory Virtualization API
The memory virtualization API supports two types of clients: (i) an untrusted commodity OS guest (Linux) running non-critical software (e.g. GUI, browser, server, games), and (ii) a set of trusted services such as controllers that drive physical actuators, run-time monitors, sensor drivers, or cryptographic services.
To support this use case the memory virtualization subsystem needs to provide two main functionalities:
-Isolation of memory resources used by the trusted components.
-Virtualization of the memory subsystem to enable the untrusted OS to dynamically manage its own memory hierarchy, and to enforce access restrictions.
The physical memory region allocated to each type of client is statically defined. Inside its own region the guest OS is free to manage its own memory, and the virtualization API is designed to provide the same guarantees to the guest OS as when it is running in native mode.
Memory Management
The ARMv7 MMU uses a two level translation scheme. We use direct paging [4] to virtualize the memory subsystem. Direct paging allows the guest to allocate the page tables inside its own memory and to directly manipulate them while the tables are not in active use by the MMU. Once the page tables are activated, the hypervisor must guarantee that further updates are possible only via the virtualization API to modify, allocate and free the page tables.
Physical memory is fragmented into blocks of 4 KB. Since L1 and L2 page tables have size 16KB and 1KB respectively, an L1 page table is stored in four contiguous physical blocks and a physical block can contain four L2 page tables. We assign a type to each physical block, that can be:
-data: the block can be written by the guest.
-L1 : contains part of an L1 and is not writable in user mode.
-L2 : contains four L2 and is not writable in user mode.
The virtualization API shown in Figure 1 is very similar to the MMU interface of the Secure Virtual Architecture [9] and consists of 9 hypercalls that selects, creates, frees, maps, or unmaps memory blocks or page tables.
Enforcing the page type constraints
Each API call needs to validate the page type, guaranteeing that page tables are write-protected. This is illustrated in Figure 2 . The table in the center represents the physical memory and stores the virtualization data structures for each physical block; the page type (pt), a flag informing if the block is allocated to the guest partition (gm), and a reference counter (rc).
The four top most blocks contain an L1 page table, whose 4096 entries are depicted by the table L1-A. The top entry of the page table is a section descriptor (T = S) that grants write permission to the guest (AP = (0, w)). This entry points (Add) to the second physical section, which consists of 256 physical blocks. Three other section descriptors of the L1 are depicted in the Change the type of an L1 block to data L2free
Change the type of an L2 block to data L1unmap Clear an entry of an L1 page table L2unmap Clear an entry of an L2 page table  L1map Set an entry of an L1 page table  L2map Set an entry of an L2 page table grants write accesses to the guest, the second one gives read-only permission to the guest (0, r), the third descriptor prevents any guest access and enables write permission for the privileged mode (1, w). The last two entries of the L1 are PT-descriptors. These two entries point to two different L2 page tables that are stored in the same physical block. The API calls manipulating an L1 enforce the following policy; Any section descriptor that allows the guest to access the memory must point to a section for which every physical block resides in the guest memory space. Moreover, if a descriptor enables guest to write then each block must be typed data. Finally, all PT-descriptors must point to physical blocks of type L2.
The Figure depicts two additional L1 page tables; L1-B and L1-C. The type of a physical block containing L1-B can be transformed to L1 by invoking L1create. On the other hand, a block containing L1-C is rejected by L1create since the block contains three entries that violate the policy. In fact, (i) the first descriptor grants guest write permission over a section which has at least one non data block, in this case L2, (ii) the second section descriptor allows the guest to access a section of the physical memory in which there exists a block that is outside the guest memory, and (iii) the third entry is a PT-descriptor, but points to a physical block that is not typed L2.
The table L2-A depicts the content of a physical block typed L2 and that contains four L2 page tables, each consisting of 256 entries. Each hypercall that manipulates an L2 enforces the following policy: if any entry of the four L2 page tables grants access permission to the guest then the pointed block must be in the guest memory. If the entry also enables guest write access then the pointed block must be typed data. For example a block containing L2-B is rejected by L2create, since the block contains at least two entries that violate the policy. A naive run-time check of the page-type policy is not efficient, since it requires to re-validate the L1 page table whenever the switch hypercall is invoked. To efficiently enforce that only blocks typed data can be written by the guest the hypervisor maintains a reference counter, which tracks for each block the sum Fig. 2 . Direct-paging mechanism of (i) the number of descriptors providing writable access in user mode to the block, and (ii) the number of PT-descriptors that point to the block.
In Figure 2 we use solid arrows to represent the references that are counted and dashed arrows to represent the other references. The intuition is that a hypercall can change the type of a physical block (e.g. allocate or free a page table) only if the corresponding reference counter is zero.
Hypervisor Guest Page Table Access
The hypervisor APIs must be able to read and write guest page tables in order to check the soundness of the requests and to apply the corresponding changes. The naive solution requires handlers to change the current page table, enabling a master page table whenever the guest memory must be accessed and then re-enabling the original page table before the guest is restored. This solution is expensive as it requires to flush TLB and caches. A solution tailored for Unixes can rely on the injective mapping built by the guest, which can be used to access the guest kernel memory. However, in our setting the hosted guest is not trusted, thus this solution can not guarantee that the injective mapping is obeyed by the guest. Instead, our design reserves a subset of the virtual address space for hypervisor use. The hypervisor master page table is built so that this address space is always mapped according to an injective translation (1-to-1) allowing to easily compute the virtual address for each physical address in the guest memory, similar to the direct memory maps supported by FreeBSD and Linux.
Memory Model and Cache Effects.
The presence of data caches and memory aliasing raise further issues. In ARMv7 CPUs such as the Cortex-A8 the MMU consults the data caches on TLB misses. When a virtual mapping is changed, the hypervisor must in general invalidate the corresponding TLB entries to guarantee that the MMU uses the updated page descriptors. However, the ARM architecture reference manual [1] predicates only weak cache coherence properties, even for single-core processors. For example, in Cortex-A8 sequential consistency is not guaranteed if the same physical address is accessed with mappings having different cacheability attributes. Thus, without knowledge of the specific processor platform, care must be taken. To ensure that the model remains valid we are forced to apply a conservative cache eviction strategy. For this reason, the hypervisor must flush the cache before accessing data stored by the guest.
More aggressive approaches (e.g. evicting only the necessary physical addresses, or avoiding flushing altogether) may be adopted for some processor implementations, but require a more fine-grained modeling including caches for their justification.
Verification Approach
The TLS models user mode execution of an arbitrary guest system on top of an ARMv7 CPU with MMU support, alternating with abstract handler events. These events model invocations of the hypervisor handlers as atomic transformations H a operating on an abstract machine state. Abstract states are concrete ARMv7 states extended by auxiliary (model) data such as page types or reference counters that reflect the internal hypervisor state. Handler events represent the execution of ARMv7 instructions at privileged level, in response to exceptions or interrupts. Modeling handler effects as atomic state transformations is possible, since the hypervisor is non-preemptive, i.e. nested exceptions/interrupts are ruled out by the implementation.
TLS Consistency Properties
Since guest systems can directly manipulate inactive page tables, the TLS needs to explicitly store page tables in memory. We must show first that this does not introduce unwanted interference between guest and hypervisor state:
1. The hypervisor must act as a security monitor for the MMU settings. If complete mediation of the MMU settings is violated, then an attacker can bypass the access policies and compromise the security of the entire system.
2. Executions of an arbitrary guest can not affect the "trusted world" , i.e. the parts of the state the guest is not supposed to be able to write, such as non-guest memory, inaccessible processor registers and status flags, and the abstract state. We view this as an integrity property, similar to the nonexfiltration property of [13] . 3. Dually, absence of information flow from the "trusted world" to the guest, confidentiality, similar to non-infiltration, must be guaranteed.
These properties, as in [13] , are qualitatively different: The integrity property is first-order, and concerns the inability of the guest to directly write some other state variables. Since it is under guest control when and how to invoke the virtualization API, there are plenty of indirect communication channels connecting guests to the hypervisor. For instance, a guest decision to allocate or deallocate an L1 block affects large parts of the hypervisor state, without ever directly writing to any internal hypervisor state variable. Enforcing this is in a sense the very purpose of the hypervisor. On the other hand, the hypervisor should be unable to affect guest state even indirectly: The only desired effects of hypervisor actions should be to allocate/deallocate, map, remap, and unmap virtual memory resources, leaving any observation a guest may make unaffected. This is essentially a second-order information flow property, needed to break guest-to-guest (or guest-to-service) information channels in much the same way as intransitive noninterference is used in [19] to break guest-to-guest channels passing through the scheduler in seL4.
Refinement
Accordingly, the first verification task is to establish the model consistency properties 1, 2 and 3 above. Extending this to an actual implementation, however, requires more work, because of the TLS abstract state, and since the TLS handlers access memory using the physical addresses. The virtualization code need to execute under the same address translation as the guest, in order to minimize the number of context switches. To show implementation soundness we exhibit a refinement property relating TLS states to implementation states. We demonstrate that the refinement relation is preserved by all atomic hypervisor operations; reads and updates of the page tables, reads and updates of the hypervisor data structures. Moreover, we prove that the refinement relation directly transfers both the consistency properties and the information flow properties of the TLS to the implementation level, completing the overall memory isolation proof.
Processor Model
The verification uses the HOL4 model of ARMv7-A developed at Cambridge [11] . This model has been extensively tested and is phrased in a manner that retains a high resemblance to the pseudocode used by ARM in the architecture reference manual [1] . The Cambridge model has been extended by ourselves to include MMU functionality. The resulting model gives a highly detailed account of the ISA level instruction semantics at the different privilege levels, including relevant MMU coprocessor effects. It must be noted that the Cambridge ARM model assumes linearizable memory, and so can be used out of the box only for processor and hypervisor implementations that satisfy this property, for instance through adequate cache flushing as discussed in section 3. We outline the HOL4 ARMv7 model in sufficient detail to make the formal results presented later understandable. An ARMv7 machine state is a record σ = regs, psrs, coregs, mem ∈ Σ, where regs, psrs, mem and coregs, respectively, represent the registers, program status registers, memory, and coprocessors. The function mode(σ) returns the current privilege execution mode in the state σ, which can be either P L0 (non-privileged or user mode, used by the guest) or P L1 (privileged mode, used by the hypervisor). The memory is the function mem ∈ 2 32 → 2 8 . The coprocessor registers coregs control the MMU. System behavior is modeled by the state transition relation → l∈{P L0,P L1} ⊆ Σ × Σ, where a transition is performed by the execution of an ARM instruction. Non-privileged transitions (σ → P L0 σ ) start from and end in states that are in non-privileged execution mode (i.e. mode(σ) = mode(σ ) = P L0). All the other transitions (σ → P L1 σ ) involve at least one state in privileged level. The raising of an exception is modeled by a transition that enables the level P L1. An exception can be raised because: (i) a software interrupt (SWI) is executed, (ii) the current instruction is undefined, or (iii) a memory access is attempted that is disallowed by the MMU. Whenever an exception occurs, the CPU disables the interrupts and jumps to a predefined address in the vector table to transfer the control to the corresponding exception handler.
MMU behavior is modeled by the function mmu(σ, P L, va, req), which takes a state σ, a privilege level, a virtual address va and an access request req ∈ {rd, wt, ex} (representing read, write and execute accesses) and yields pa ∈ 2 32 ∪ {⊥}, where pa is the translated physical address or an access denied. The state transition relation queries the MMU whenever a virtual address is accessed, and raises an exception if the requested access mode is not allowed.
Formalizing the Proof Goals
A TLS state is a tuple σ, h , consisting of an ARMv7 state σ and an abstract hypervisor state h of the form pgtype, pgrefs where pgtype indicates memory block types and pgrefs maintains reference counters. Specifically, pgtype ∈ 2 20 → {D, L1, L2} tracks the type of each 4kb physical block; a block can either be (D) memory writable from the guest or data page, (L1) contain a L1 page table or (L2) contain a L2 page table. The map pgrefs ∈ 2 20 → 2 30 tracks the references to each physical block, as described in Section 3.
The TLS interleaves standard non-privileged transitions with abstract handler invocations. Formally, the TLS transition relation σ, h → i∈{0,1} σ , h is defined as follows:
-If σ → P L0 σ then σ, h → 0 σ , h ; instructions executed in non-privileged mode that do not raise exceptions behave equivalently to the standard ARMv7 semantics and do not affect the abstract hypervisor state. -If σ → P L1 σ and mode(σ) = P L0 then σ, h → 1 H a ( σ , h ); whenever an exception is raised, the hypervisor is executed, modeled by the abstract handler H a .
In our setup the trusted services and the untrusted guest are both executed in non-priviledged mode. To distinguish between these two partitions, we use ARM domains. In the ARM architecture domains are the primarily access control mechanism used by the MMU. This mechanism is orthogonal to the CPU execution modes. The architecture provides sixteen domains, each of them can be activated independently. We reserve the domains 2-15 for the secure services.
In the following we use the predicate S(σ) to identify if the active partition is the one hosting the secure services: the predicate holds if at least one of the reserved domain is enabled.
TLS Consistency
We introduce a system invariant I( σ, h ) used to constrain the set of consistent initial states of the TLS. The invariant is needed, for instance, to ensure that guests have write access to page tables only when they are inactive. We use Q I to represent the set of all possible TLS states that satisfy the invariant. We thus need to show:
We say that two states are MMU-equivalent if for any virtual address va the MMU yields the same translation and the same access permissions. Formally, σ ≡ mmu σ if and only if mmu(σ, P L, va, req) = mmu(σ , P L, va, req)
for any va, P L, req. Complete mediation (MMU-integrity) is demonstrated by showing that neither the guest nor the secure services are able to directly change the content of the page tables and affect the address translation mechanism.
We use the approach of [13] to analyze the hypervisor data separation properties. The observations of the guest in a state σ, h is represented by the structure O g ( σ, h ) = uregs, cpsr , mem g , coregs of user registers uregs, control register cpsr , guest memory mem g and coprocessor registers coregs. The register cpsr and the coprocessor registers are visible to the guest since they directly affect guest behavior, and do not contain any information the guest should not be allowed to see. Evidently, however, all writes to the coprocessor registers must be mediated by the hypervisor.
The remaining part of the state (i.e. the content of the memory locations that are not part of the guest memory, special registers) and, again, the coprocessor registers constitute the secure observations O s ( σ, h ) of the state, which guest transitions are not supposed to affect.
The following theorem demonstrates that the context switch between the untrusted guest and the trusted services is not possible without the mediation of the hypervisor. The proof is straightforward, since S only depends on coprocessor registers that are not modifiable in nonprivileged mode.
The non-exfiltration property guarantees that a transition executed by the guest does not modify the secure resources:
The non-infiltration property is a non-interference property guaranteeing that guest instructions and hypercalls executed on behalf of the guest do not depend on any information stored in resources not accessible by the guest.
The Implementation Model
A critical problem of verifying low level platforms is that intermediate states of the MMU configuration can break the semantics of the high level language (e.g. C). This is the reason we introduced the implementation model, that is sufficiently detailed to expose misbehavior of the hypervisor accesses to the observable part of the memory (i.e. page tables, guest memory and internal data structure). The implementation interleaves standard non-privileged transitions and hypervisor functionalities. In contrast to the TLS, these functionalities now store their internal data in system memory, accessed by means of virtual addresses. In practice, in the implementation model the hypervisor functionalities are expressed as executable specifications that are, however, very to close the actions executed by an actual machine at instruction semantics level. We demonstrate these differences by comparing two fragments of the TLS and the implementation specifications.
The TLS models the update of a guest page table descriptor as σ .mem = write 32 (σ.mem, pa, desc), where pa is the physical address of the entry, desc is a word representing the new descriptor and write 32 is a function that yields a new memory having four consecutive bytes updated. At the implementation level the same operation is represented as if ¬ mmu(σ, PL1, Gpa2va(pa)).wt then ⊥ else write 32 (σ.mem,mmu(σ,PL1,Gpa2va(pa)).pa,desc)
where Gpa2va is the function used by the hypervisor to compute the virtual address of a physical address that resides in guest memory. This function is statically defined and is the inverse of the injective translation established by the hypervisor master page table. The implementation can fail to match the TLS for two reasons: (i) the current page table can prevent the hypervisor from accessing the computed virtual address, and then the implementation terminates in a failing state (denoted by ⊥), (ii) the current address translation does not respect the expected injective mapping, thus mmu(σ, P L1, Gpa2va(pa)).pa = pa and the implementation writes in an address that differs from the one updated by the TLS.
The next example shows the difference between access of the reference counter in the TLS and at implementation level. The TLS models this operation as h.refs(b), where b is the physical block. The implementation models the same operation using memory offsets as follows:
This representation is directly reflected in the hypervisor code. For each block, the page type (two bits) and the reference counter (30 bits) are placed contiguously in a word. These words form an array, whose initial virtual address is tbl va .
The concrete handlers are represented by a HOL4 function H r from concrete ARMv7 states to concrete ARMv7 states. The function is the executable specification of the various exception handlers including the MMU functionalities.
Then, implementation behavior is determined by the state transition relation → i∈{0,1} ⊆ Σ × (Σ ∪ {⊥}) as follows:
-If σ → P L0 σ then σ → 0 σ ; instructions executed in non-privileged mode that do not raise exceptions behave according to the standard ARMv7 semantics. -If σ → P L1 σ and mode(σ) = P L0 then σ → 1 H r (σ ); whenever an exception is raised, the hypervisor is executed and its behavior is modeled by the function H r .
The Refinement
To show implementation soundness we exhibit a refinement property relating abstract states σ 1 , h to concrete states σ 2 . The refinement relation R requires that: (i) the registers and coprocessors contain the same value in both states, (ii) the guest memory contains the same values in both states, (iii) part of the memory of the implementation state contains a mapping of the hypervisor data structures of the TLS state and (iv) the reserved virtual addresses are always mapped equivalently to the master page table. Observations of the guest O g are defined on concrete states using the hypervisor data structure mapping in analogy with the corresponding observations on abstract states defined above.
Theorem 6. Let σ 1 , h ∈ Q I and σ 2 ∈ Σ such that σ 1 , h R σ 2 . Let i ∈ {0, 1}. Then σ 2 → i σ 2 if and only σ 1 , h → i σ 1 , h and σ 1 , h R σ 2 .
Finally we show that the security property of the TLS and the refinement relation directly transfer the mmu-integrity/non-exfiltration/non-infiltration to the implementation. We use Σ I to represent the space of consistent concrete states: States σ 2 such that if σ 1 , h R σ 2 then I( σ 1 , h ).
Linux support
To evaluate the real-world feasibility of our approach we examine a virtualized Linux guest. The Linux kernel v2. 6 .34 has been modified to run on top of the hypervisor. This task required modification of architecture-dependent parts of the Linux kernel like (i) execution modes, (ii) low-level exception routines and (iii) page table management. High-level OS functions such as process, resource and memory manager, file system and networking did not require any modifications. CPU privilege modes. The target CPU includes only two execution modes: privileged and unprivileged (user). Like for other approaches based on paravirtualization, since the hypervisor executes as privileged, then the Linux kernel has been modified to execute as unprivileged. To separate kernel and user applications, the hypervisor manages two separate unprivileged execution contexts: virtual user and virtual kernel modes. In x86 these virtual modes can be implemented by segmentation limits. This approach is not possible for CPUs that do not provide this feature (e.g. x86 64-bits and ARM). Instead, for kernel-user space isolation we use ARM domains, that implement an access control regime orthogonal to the CPU execution modes. Notice that the main security goal here is not to guarantee this OS-internal isolation, but to maintain the separation between the virtualized components. CPU exceptions. CPU exceptions such as aborts and interrupts change the processor mode to privileged. These exceptions must therefore be handled in the hypervisor, which after validation can forward them to the unprivileged exception handlers of the Linux kernel. The hypervisor supplies the kernel exception handlers with some privileged data needed to correctly service an on-going exception (i.e. for pre-fetch abort, the privileged fault address and fault status registers are forwarded to the guest). The exception handlers in the Linux kernel have thus been slightly modified to support this. Memory management. Within the Linux kernel, virtual memory is handled in two layers. The first is platform independent and provides a number of highlevel functions to the rest of the kernel. The second layer provides a number of platform dependent functions to the first layer. To allow virtualization, we modified the second layer to perform a hypercall instead of performing privileged access to the hardware.
Benchmark and evaluation
Runtime overhead. To analyze runtime overhead we use LMBench [18] (of which the fork benchmarks stress the MMU virtualization) running on Linux 2.6.34 2 with and without virtualization. The outcome, measured on an ARMv7-A Cortex-A8 powered embedded system (BeagleBoard-xM), is presented in Table 1 . Additionaly, we use the creation (tar) and compression (gzip) of archives as macrobenchmarks. The significant virtualization overhead for the fork benchmarks is due to a large number of simple operations (in this case, write access to a page-table) being replaced with a large number of expensive hypercalls. It may be possible to reduce this overhead with minimal optimization (e.g. batching).
In Table 1 we also report the overheads measured in [15] of several existing hypervisors for ARM. We point out that these performance numbers have been obtained from different sources, testing different ARM cores, boards and hosted Linux kernels. Moreover, the numbers presented here use a completely unoptimized version of the hypervisor that we believe can be significantly improved. The verification highlighted a number of bugs in the initial design of the APIs: (i) arithmetic overflow when updating the reference counter, caused by not preventing the guest to create an unbounded number of references to a physical block, (ii) bit field and offset mismatch, (iii) missing check that a newly allocated page table prevents the guest to overwrite the page table itself, (iv) usage of signed shift operator where the unsigned one was necessary and (v) approval of guest requests that cause unpredictable MMU behavior. The verification of the implementation model identified three additional bugs exploitable by the guest by requesting the validation of page tables outside the guest memory.
The project was conducted in three steps. The design, modeling and verification of the APIs for memory virtualization required nine person months. Here, the most expensive tasks have been the verification of Theorems 1 [20] and 6. The C implementation of the APIs and the Linux port has been accomplished in three months. While the implementation team was completing the Linux port the verification team started the verification of the refinement, which has taken three months so far. This work is continuing, in order to complete the verification from the HOL4 implementation level down to assembly.
Concluding remarks
We presented the first hypervisor (i) for a COTS application processor architecture (ARMv7), (ii) whose spatial separation properties have been formally verified, (iii) capable of hosting a Linux system. As example application, in [6] we used the virtualization mechanism to support a tamper-proof run-time monitor that prevents code injection in an untrusted Linux guest.
The only verified hypervisor in the literature capable of hosting a commodity OS is Microsoft's Hyper-V [17] . However, little detailed information about the Hyper-V internal structure or the Hyper-V verification exercise is publicly available. As part of the Hyper-V verification project, a hypervisor for a simplified, MIPS-like architecture including memory virtualization is described in [3, 21] . However, the relation of the simplified hypervisor to Hyper-V itself is not clear. As other, unverified, hypervisors for ARM such as the OKL4 microvisor [12] the Hyper-V precursor of Paul et al uses shadow page tables for MMU virtualization. Our result demonstrates that secure isolation of a commodity OS can be achieved with highly promising performance without requiring either specialized hardware support or shadow data structures. This applies even before assembly level and cache related optimizations are performed. This represents the first trustworthy virtualization mechanism based on "direct paging", an approach inspired by the paravirtualization mechanism of Xen.
The implementation model takes into account low-level details (i.e. virtual address translation, bit field manipulation, finite integer arithmetic, accesses to the hypervisor data not mediated by high level data structures) and represent an executable specification. The model is sufficiently detailed to spot possible errors that arise when the hypervisor uses virtual addresses and exactly reflects the control flow of the C-implementation. Part of our ongoing research efforts is to adapt existing techniques [23] to verify the hypervisor binary code.
