45 research outputs found
Pre-Virtualization: Slashing the cost of virtualization
Despite its current popularity, para-virtualization has an
enormous cost. Its
diversion from the platform architecture abandons many of the
benefits that come
with pure virtualization (the faithful emulation of the platform
API): stable and
well-defined platform interfaces, single binaries for kernel and
device drivers (and
thus lower testing, maintenance, and support cost), and vendor
independence.
These limitations are accepted as inevitable for significantly
better performance
and the ability to provide virtualization-like behavior on
non-virtualizable
hardware, such as x86.
We argue that the above limitations are not inevitable, and
present pre-
virtualization, which preserves the benefits of full
virtualization without sacrificing
the performance benefits of para-virtualization. In a
semi-automatic step an OS is
prepared for virtualization. The required modifications are
orders of magnitudes
smaller than for para-virtualization. A virtualization module,
that is collocated with
the guest OS, transforms the standard platform API into the
respective hypervisor
API. The guest OS is still programmed against a common
architecture, and the
binary remains fully functional on bare hardware. The support of
a new hypervisor
or updated interface only requires the implementation of a
single interface
mapping. We validated our approach for a variety of hypervisors,
on two very
different hardware platforms (x86 and Itanium), with multiple
generations of Linux
as guests. We found that pre-virtualization achieves essentially
the same
performance as para-virtualization, at a fraction of the
engineering cost
Concurrent Search Data Structures Can Be Blocking and Practically Wait-Free
We argue that there is virtually no practical situation in which one should seek a "theoretically wait-free" algorithm at the expense of a state-of-the-art blocking algorithm in the case of search data structures: blocking algorithms are simple, fast, and can be made "practically wait-free". We draw this conclusion based on the most exhaustive study of blocking search data structures to date. We consider (a) different search data structures of different sizes, (b) numerous uniform and non-uniform workloads, representative of a wide range of practical scenarios, with different percentages of update operations, (c) with and without delayed threads, (d) on different hardware technologies, including processors providing HTM instructions. We explain our claim that blocking search data structures are practically wait-free through an analogy with the birthday paradox, revealing that, in state-of-the-art algorithms implementing such data structures, the probability of conflicts is extremely small. When conflicts occur as a result of context switches and interrupts, we show that HTM-based locks enable blocking algorithms to cope with the
97]. One
Abstract The L4 micro kernel [Lie98] provides extremely fast inter-process communication (IPC) [LE