2 research outputs found
Efficient cross-architecture hardware virtualisation
Hardware virtualisation is the provision of an isolated virtual environment that
represents real physical hardware. It enables operating systems, or other system-level
software (the guest), to run unmodified in a âcontainerâ (the virtual machine)
that is isolated from the real machine (the host).
There are many use-cases for hardware virtualisation that span a wide-range
of end-users. For example, home-users wanting to run multiple operating systems
side-by-side (such as running a WindowsÂź operating system inside an OS
X environment) will use virtualisation to accomplish this. In research and development
environments, developers building experimental software and hardware
want to prototype their designs quickly, and so will virtualise the platform
they are targeting to isolate it from their development workstation. Large-scale
computing environments employ virtualisation to consolidate hardware, enforce
application isolation, migrate existing servers or provision new servers.
However, the majority of these use-cases call for same-architecture virtualisation,
where the architecture of the guest and the host machines matchâa situation
that can be accelerated by the hardware-assisted virtualisation extensions
present on modern processors. But, there is significant interest in virtualising
the hardware of different architectures on a host machine, especially in the
architectural research and development worlds.
Typically, the instruction set architecture of a guest platform will be different
to the host machine, e.g. an ARM guest on an x86 host will use an ARM instruction
set, whereas the host will be using the x86 instruction set. Therefore, to
enable this cross-architecture virtualisation, each guest instruction must be emulated
by the host CPUâa potentially costly operation. This thesis presents a
range of techniques for accelerating this instruction emulation, improving over
a state-of-the art instruction set simulator by 2:64x. But, emulation of the guest
platformâs instruction set is not enough for full hardware virtualisation. In fact,
this is just one challenge in a range of issues that must be considered. Specifically,
another challenge is efficiently handling the way external interrupts are
managed by the virtualisation system. This thesis shows that when employing
efficient instruction emulation techniques, it is not feasible to arbitrarily
divert control-flow without consideration being given to the state of the emulated
processor. Furthermore, it is shown that it is possible for the virtualisation
environment to behave incorrectly if particular care is not given to the point
at which control-flow is allowed to diverge. To solve this, a technique is developed
that maintains efficient instruction emulation, and correctly handles
external interrupt sources.
Finally, modern processors have built-in support for hardware virtualisation
in the form of instruction set extensions that enable the creation of an abstract
computing environment, indistinguishable from real hardware. These extensions
enable guest operating systems to run directly on the physical processor,
with minimal supervision from a hypervisor. However, these extensions are
geared towards same-architecture virtualisation, and as such are not immediately
well-suited for cross-architecture virtualisation. This thesis presents a
technique for exploiting these existing extensions, and using them in a cross-architecture
virtualisation setting, improving the performance of a novel cross-architecture
virtualisation hypervisor over state-of-the-art by 2:5x
Evaluating fragment construction policies for sdt systems
Software Dynamic Translation (SDT) systems have been used for program instrumentation, dynamic optimization, security policy enforcement, intrusion detection, and many other uses. To be widely applicable, the overhead (runtime, memory usage, and power consumption) should be as low as possible. For instance, if an SDT system is protecting a web server against possible attacks, but causes 30 % slowdown, a company may need 30 % more machines to handle the web traffic they expect. Consequently, the causes of SDT overhead should be studied rigorously. This work evaluates many alternative policies for the creation of fragments within the Strata SDT framework. In particular, we examine the effects of ending translation at conditional branches; ending translation at unconditional branches; whether to use partial inlining for call instructions; whether to build the target of calls immediately or lazily; whether to align branch targets; and how to place code to transition back to the dynamic translator. We find that effective translation strategies are vital to program performance, improving performance from as much as 28 % overhead, to as little as 3 % overhead on average for the SPEC CPU2000 benchmark suite. We further demonstrate that these translation strategies are effective across several platforms, including Sun SPARC UltraSpar