98 research outputs found

    The construction of high-performance virtual machines for dynamic languages

    Get PDF
    Dynamic languages, such as Python and Ruby, have become more widely used over the past decade. Despite this, the standard virtual machines for these languages have disappointing performance. These virtual machines are slow, not because methods for achieving better performance are unknown, but because their implementation is hard. What makes the implementation of high-performance virtual machines difïŹcult is not that they are large pieces of software, but that there are fundamental and complex interdependencies between their components. In order to work together correctly, the interpreter, just-in-time compiler, garbage collector and library must all conform to the same precise low-level protocols. In this dissertation I describe a method for constructing virtual machines for dynamic languages, and explain how to design a virtual machine toolkit by building it around an abstract machine. The design and implementation of such a toolkit, the Glasgow Virtual Machine Toolkit, is described. The Glasgow Virtual Machine Toolkit automatically generates a just-in-time compiler, integrates precise garbage collection into the virtual machine, and automatically manages the complex inter-dependencies between all the virtual machine components. Two different virtual machines have been constructed using the GVMT. One is a minimal implementation of Scheme; which was implemented in under three weeks to demonstrate that toolkits like the GVMT can enable the easy construction of virtual machines. The second, the HotPy VM for Python, is a high-performance virtual machine; it demonstrates that a virtual machine built with a toolkit can be fast and that the use of a toolkit does not overly constrain the high-level design. Evaluation shows that HotPy outperforms the standard Python interpreter, CPython, by a large margin, and has performance on a par with PyPy, the fastest Python VM currently available

    Draining the Swamp: Micro Virtual Machines as Solid Foundation for Language Development

    Get PDF
    Many of today\u27s programming languages are broken. Poor performance, lack of features and hard-to-reason-about semantics can cost dearly in software maintenance and inefficient execution. The problem is only getting worse with programming languages proliferating and hardware becoming more complicated. An important reason for this brokenness is that much of language design is implementation-driven. The difficulties in implementation and insufficient understanding of concepts bake bad designs into the language itself. Concurrency, architectural details and garbage collection are three fundamental concerns that contribute much to the complexities of implementing managed languages. We propose the micro virtual machine, a thin abstraction designed specifically to relieve implementers of managed languages of the most fundamental implementation challenges that currently impede good design. The micro virtual machine targets abstractions over memory (garbage collection), architecture (compiler backend), and concurrency. We motivate the micro virtual machine and give an account of the design and initial experience of a concrete instance, which we call Mu, built over a two year period. Our goal is to remove an important barrier to performant and semantically sound managed language design and implementation

    Adaptive Lock-Free Data Structures in Haskell: A General Method for Concurrent Implementation Swapping

    Full text link
    A key part of implementing high-level languages is providing built-in and default data structures. Yet selecting good defaults is hard. A mutable data structure's workload is not known in advance, and it may shift over its lifetime - e.g., between read-heavy and write-heavy, or from heavy contention by multiple threads to single-threaded or low-frequency use. One idea is to switch implementations adaptively, but it is nontrivial to switch the implementation of a concurrent data structure at runtime. Performing the transition requires a concurrent snapshot of data structure contents, which normally demands special engineering in the data structure's design. However, in this paper we identify and formalize an relevant property of lock-free algorithms. Namely, lock-freedom is sufficient to guarantee that freezing memory locations in an arbitrary order will result in a valid snapshot. Several functional languages have data structures that freeze and thaw, transitioning between mutable and immutable, such as Haskell vectors and Clojure transients, but these enable only single-threaded writers. We generalize this approach to augment an arbitrary lock-free data structure with the ability to gradually freeze and optionally transition to a new representation. This augmentation doesn't require changing the algorithm or code for the data structure, only replacing its datatype for mutable references with a freezable variant. In this paper, we present an algorithm for lifting plain to adaptive data and prove that the resulting hybrid data structure is itself lock-free, linearizable, and simulates the original. We also perform an empirical case study in the context of heating up and cooling down concurrent maps.Comment: To be published in ACM SIGPLAN Haskell Symposium 201

    Micro Virtual Machines: A Solid Foundation for Managed Language Implementation

    Get PDF
    Today new programming languages proliferate, but many of them suffer from poor performance and inscrutable semantics. We assert that the root of many of the performance and semantic problems of today's languages is that language implementation is extremely difficult. This thesis addresses the fundamental challenges of efficiently developing high-level managed languages. Modern high-level languages provide abstractions over execution, memory management and concurrency. It requires enormous intellectual capability and engineering effort to properly manage these concerns. Lacking such resources, developers usually choose naive implementation approaches in the early stages of language design, a strategy which too often has long-term consequences, hindering the future development of the language. Existing language development platforms have failed to provide the right level of abstraction, and forced implementers to reinvent low-level mechanisms in order to obtain performance. My thesis is that the introduction of micro virtual machines will allow the development of higher-quality, high-performance managed languages. The first contribution of this thesis is the design of Mu, with the specification of Mu as the main outcome. Mu is the first micro virtual machine, a robust, performant, and light-weight abstraction over just three concerns: execution, concurrency and garbage collection. Such a foundation attacks three of the most fundamental and challenging issues that face existing language designs and implementations, leaving the language implementers free to focus on the higher levels of their language design. The second contribution is an in-depth analysis of on-stack replacement and its efficient implementation. This low-level mechanism underpins run-time feedback-directed optimisation, which is key to the efficient implementation of dynamic languages. The third contribution is demonstrating the viability of Mu through RPython, a real-world non-trivial language implementation. We also did some preliminary research of GHC as a Mu client. We have created the Mu specification and its reference implementation, both of which are open-source. We show that that Mu's on-stack replacement API can gracefully support dynamic languages such as JavaScript, and it is implementable on concrete hardware. Our RPython client has been able to translate and execute non-trivial RPython programs, and can run the RPySOM interpreter and the core of the PyPy interpreter. With micro virtual machines providing a low-level substrate, language developers now have the option to build their next language on a micro virtual machine. We believe that the quality of programming languages will be improved as a result

    Tracing vs. Partial Evaluation: Comparing Meta-Compilation Approaches for Self-Optimizing Interpreters

    Get PDF
    Tracing and partial evaluation have been proposed as meta-compilation techniques for interpreters to make just-in-time compilation language-independent. They promise that programs executing on simple interpreters can reach performance of the same order of magnitude as if they would be executed on state-of-the-art virtual machines with highly optimizing just-in-time compilers built for a specific language. Tracing and partial evaluation approach this meta-compilation from two ends of a spectrum, resulting in different sets of tradeoffs. This study investigates both approaches in the context of self-optimizing interpreters, a technique for building fast abstract-syntax-tree interpreters. Based on RPython for tracing and Truffle for partial evaluation, we assess the two approaches by comparing the impact of various optimizations on the performance of an interpreter for SOM, an object-oriented dynamically-typed language. The goal is to determine whether either approach yields clear performance or engineering benefits. We find that tracing and partial evaluation both reach roughly the same level of performance. SOM based on meta-tracing is on average 3x slower than Java, while SOM based on partial evaluation is on average 2.3x slower than Java. With respect to the engineering, tracing has however significant benefits, because it requires language implementers to apply fewer optimizations to reach the same level of performance

    Efficient cross-architecture hardware virtualisation

    Get PDF
    Hardware virtualisation is the provision of an isolated virtual environment that represents real physical hardware. It enables operating systems, or other system-level software (the guest), to run unmodified in a “container” (the virtual machine) that is isolated from the real machine (the host). There are many use-cases for hardware virtualisation that span a wide-range of end-users. For example, home-users wanting to run multiple operating systems side-by-side (such as running a Windows¼ operating system inside an OS X environment) will use virtualisation to accomplish this. In research and development environments, developers building experimental software and hardware want to prototype their designs quickly, and so will virtualise the platform they are targeting to isolate it from their development workstation. Large-scale computing environments employ virtualisation to consolidate hardware, enforce application isolation, migrate existing servers or provision new servers. However, the majority of these use-cases call for same-architecture virtualisation, where the architecture of the guest and the host machines match—a situation that can be accelerated by the hardware-assisted virtualisation extensions present on modern processors. But, there is significant interest in virtualising the hardware of different architectures on a host machine, especially in the architectural research and development worlds. Typically, the instruction set architecture of a guest platform will be different to the host machine, e.g. an ARM guest on an x86 host will use an ARM instruction set, whereas the host will be using the x86 instruction set. Therefore, to enable this cross-architecture virtualisation, each guest instruction must be emulated by the host CPU—a potentially costly operation. This thesis presents a range of techniques for accelerating this instruction emulation, improving over a state-of-the art instruction set simulator by 2:64x. But, emulation of the guest platform’s instruction set is not enough for full hardware virtualisation. In fact, this is just one challenge in a range of issues that must be considered. Specifically, another challenge is efficiently handling the way external interrupts are managed by the virtualisation system. This thesis shows that when employing efficient instruction emulation techniques, it is not feasible to arbitrarily divert control-flow without consideration being given to the state of the emulated processor. Furthermore, it is shown that it is possible for the virtualisation environment to behave incorrectly if particular care is not given to the point at which control-flow is allowed to diverge. To solve this, a technique is developed that maintains efficient instruction emulation, and correctly handles external interrupt sources. Finally, modern processors have built-in support for hardware virtualisation in the form of instruction set extensions that enable the creation of an abstract computing environment, indistinguishable from real hardware. These extensions enable guest operating systems to run directly on the physical processor, with minimal supervision from a hypervisor. However, these extensions are geared towards same-architecture virtualisation, and as such are not immediately well-suited for cross-architecture virtualisation. This thesis presents a technique for exploiting these existing extensions, and using them in a cross-architecture virtualisation setting, improving the performance of a novel cross-architecture virtualisation hypervisor over state-of-the-art by 2:5x

    An Abstract Interpretation-based Model of Tracing Just-In-Time Compilation

    Get PDF
    Tracing just-in-time compilation is a popular compilation technique for the efficient implementation of dynamic languages, which is commonly used for JavaScript, Python and PHP. We provide a formal model of tracing JIT compilation of programs using abstract interpretation. Hot path detection corresponds to an abstraction of the trace semantics of the program. The optimization phase corresponds to a transform of the original program that preserves its trace semantics up to an observation modeled by some abstraction. We provide a generic framework to express dynamic optimizations and prove them correct. We instantiate it to prove the correctness of dynamic type specialization and constant variable folding. We show that our framework is more general than the model of tracing compilation introduced by Guo and Palsberg [2011] based on operational bisimulations.Comment: To appear in ACM Transactions on Programming Languages and System

    A Flexible Framework for Studying Trace-Based Just-In-Time Compilation

    Get PDF
    Just-in-time compilation has proven an effective, though effort-intensive, choice for realizing performant language runtimes. Recently introduced JIT compilation frameworks advocate applying meta-compilation techniques such as partial evaluation or meta-tracing on simple interpreters to reduce the implementation effort. However, such frameworks are few and far between. Designed and highly optimized for performance, they are difficult to experiment with. We therefore present STRAF, a minimalistic yet flexible Scala framework for studying trace-based JIT compilation. STRAF is sufficiently general to support a diverse set of language interpreters, but also sufficiently extensible to enable experiments with trace recording and optimization. We demonstrate the former by plugging two different interpreters into STRAF. We demonstrate the latter by extending STRAF with e.g., constant folding and type-specialization optimizations, which are commonly found in dedicated trace-based JIT compilers. The evaluation shows that STRAF is suitable for prototyping new techniques and formalisms in the domain of trace-based JIT compilation
    • 

    corecore