29 research outputs found

    Hey, You Got Your Language In My Operating System!

    Get PDF
    Several projects in the operating systems research community suggest a trend of convergence among features once divided between operating systems and languages. We describe how partial evaluation and transformational programming systems apply to this trend by providing a general framework for application support, from compilation to run-time services. We contend that the community will no longer think of implementing a static collection of services and calling it an operating system; instead, this general framework will allow applications to be flexibly configured, and the ``operating system\u27\u27 will simply be the application support that is supplied at run-time

    Simple and Effective Type Check Removal through Lazy Basic Block Versioning

    Get PDF
    Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language VM implementations must attempt to eliminate redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. This paper introduces lazy basic block versioning, a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses and avoids the implementation complexity of speculative optimization techniques. We have implemented intraprocedural lazy basic block versioning in a JavaScript JIT compiler. This approach is compared with a classical flow-based type analysis. Lazy basic block versioning performs as well or better on all benchmarks. On average, 71% of type tests are eliminated, yielding speedups of up to 50%. We also show that our implementation generates more efficient machine code than TraceMonkey, a tracing JIT compiler for JavaScript, on several benchmarks. The combination of implementation simplicity, low algorithmic complexity and good run time performance makes basic block versioning attractive for baseline JIT compilers

    Speculative Staging for Interpreter Optimization

    Full text link
    Interpreters have a bad reputation for having lower performance than just-in-time compilers. We present a new way of building high performance interpreters that is particularly effective for executing dynamically typed programming languages. The key idea is to combine speculative staging of optimized interpreter instructions with a novel technique of incrementally and iteratively concerting them at run-time. This paper introduces the concepts behind deriving optimized instructions from existing interpreter instructions---incrementally peeling off layers of complexity. When compiling the interpreter, these optimized derivatives will be compiled along with the original interpreter instructions. Therefore, our technique is portable by construction since it leverages the existing compiler's backend. At run-time we use instruction substitution from the interpreter's original and expensive instructions to optimized instruction derivatives to speed up execution. Our technique unites high performance with the simplicity and portability of interpreters---we report that our optimization makes the CPython interpreter up to more than four times faster, where our interpreter closes the gap between and sometimes even outperforms PyPy's just-in-time compiler.Comment: 16 pages, 4 figures, 3 tables. Uses CPython 3.2.3 and PyPy 1.

    A detailed VM profiler for the Cog VM

    Get PDF
    International audienceCode profiling enables a user to know where in an application or function the execution time is spent. The Pharo ecosystem offers several code profilers. However, most of the publicly available profilers (MessageTally, Spy, GadgetPro-filer) largely ignore the activity carried out by the virtual machine , thus incurring inaccuracy in the gathered information and missing important information, such as the Just-in-time compiler activity. This paper describes the motivations and the latest improvements carried out in VMProfiler, a code execution pro-filer hooked into the virtual machine, that performs its analysis by monitoring the virtual machine execution. These improvements address some limitations related to assessing the activity of native functions (resulting from a Just-in-time compiler operation): as of now, VMProfiler provides more detailed profiling reports, showing for native code functions in which bytecode range the execution time is spent

    Array bounds check elimination in the context of deoptimization

    Get PDF
    AbstractWhenever an array element is accessed, Java virtual machines execute a compare instruction to ensure that the index value is within the valid bounds. This reduces the execution speed of Java programs. Array bounds check elimination identifies situations in which such checks are redundant and can be removed. We present an array bounds check elimination algorithm for the Java HotSpot™ VM based on static analysis in the just-in-time compiler.The algorithm works on an intermediate representation in static single assignment form and maintains conditions for index expressions. It fully removes bounds checks if it can be proven that they never fail. Whenever possible, it moves bounds checks out of loops. The static number of checks remains the same, but a check inside a loop is likely to be executed more often. If such a check fails, the executing program falls back to interpreted mode, avoiding the problem that an exception is thrown at the wrong place.The evaluation shows a speedup near to the theoretical maximum for the scientific SciMark benchmark suite and also significant improvements for some Java Grande benchmarks. The algorithm slightly increases the execution speed for the SPECjvm98 benchmark suite. The evaluation of the DaCapo benchmarks shows that array bounds checks do not have a significant impact on the performance of object-oriented applications

    Inline Caching Meets Quickening. In

    Get PDF
    Abstract. Inline caches effectively eliminate the overhead implied by dynamic typing. Yet, inline caching is mostly used in code generated by just-in-time compilers. We present efficient implementation techniques for using inline caches without dynamic translation, thus enabling future interpreter implementers to use this important optimization techniquewe report speedups of up to a factor of 1.71-without the additional implementation and maintenance costs incurred by using a just-in-time compiler

    Fast Linear Programming through Transprecision Computing on Small and Sparse Data

    Get PDF
    A plethora of program analysis and optimization techniques rely on linear programming at their heart. However, such techniques are often considered too slow for production use. While today’s best solvers are optimized for complex problems with thousands of dimensions, linear programming, as used in compilers, is typically applied to small and seemingly trivial problems, but to many instances in a single compilation run. As a result, compilers do not benefit from decades of research on optimizing large-scale linear programming. We design a simplex solver targeted at compilers. A novel theory of transprecision computation applied from individual elements to full data-structures provides the computational foundation. By carefully combining it with optimized representations for small and sparse matrices and specialized small-coefficient algorithms, we (1) reduce memory traffic, (2) exploit wide vectors, and (3) use low-precision arithmetic units effectively. We evaluate our work by embedding our solver into a state-of-the-art integer set library and implement one essential operation, coalescing, on top of our transprecision solver. Our evaluation shows more than an order-of-magnitude speedup on the core simplex pivot operation and a mean speedup of 3.2x (vs. GMP) and 4.6x (vs. IMath) for the optimized coalescing operation. Our results demonstrate that our optimizations exploit the wide SIMD instructions of modern microarchitectures effectively. We expect our work to provide foundations for a future integer set library that uses transprecision arithmetic to accelerate compiler analyses.ISSN:2475-142
    corecore