23 research outputs found

    Program Compression

    Get PDF
    The talk focused on a grammar-based technique for identifying redundancy in program code and taking advantage of that redundancy to reduce the memory required to store and execute the program. The idea is to start with a simple context-free grammar that represents all valid basic blocks of any program. We represent a program by the parse trees (i.e. derivations) of its basic blocks using the grammar. We then modify the grammar, by considering sample programs, so that idioms of the language have shorter derivations in the modified grammar. Since each derivation represents a basic block, we can interpret the resulting set of derivations much as we would interpret the original program. We need only expand the grammar rules indicated by the derivation to produce a sequence of original program instructions to execute. The result is a program representation that is approximately 40% of the original program size and is interpretable by a very modest-sized interpreter

    Optimizing indirect branch prediction accuracy in virtual machine interpreters

    Get PDF
    Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers are the best widely available form of indirect branch prediction; however, their prediction accuracy for existing interpreters is only 2%–50%. In this paper we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter buildtime) and dynamic (interpreter run-time) variants of these techniques and compare them and several combinations of these techniques. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 3.17 over efficient threaded-code interpreters, and speedups by a factor of up to 1.3 over techniques relying on superinstructions alone

    Towards Superinstructions for Java Interpreters

    Get PDF
    The Java Virtual Machine (JVM) is usually implemented by an interpreter or just-in-time (JIT) compiler. JITs provide the best performance, but interpreters have a number of advantages that make them attractive, especially for embedded systems. These advantages include simplicity, portability and lower memory requirements. Instruction dispatch is responsible for most of the running time of efficient interpreters, especially on pipelined processors. Superinstructions are an important optimisation to reduce the number of instruction dispatches. A superinstruction is a new Java instruction which performs the work of a common sequence of instructions. In this paper we describe work in progress on the design and implementation of a system of superinstructions for an efficient Java interpreter for connected devices and embedded systems. We describe our basic interpreter, the interpreter generator we use to automatically create optimised source code for superinstructions, and discuss Java specific issues relating to superinstructions. Our initial experimental results show that superinstructions can give large speedups on the SPECjvm98 benchmark suite

    High performance annotation-aware JVM for Java cards

    Full text link
    Early applications of smart cards have focused in the area of per-sonal security. Recently, there has been an increasing demand for networked, multi-application cards. In this new scenario, enhanced application-specific on-card Java applets and complex cryptographic services are executed through the smart card Java Virtual Machine (JVM). In order to support such computation-intensive applica-tions, contemporary smart cards are designed with built-in micro-processors and memory. As smart cards are highly area-constrained environments with memory, CPU and peripherals competing for a very small die space, the VM execution engine of choice is often a small, slow interpreter. In addition, support for multiple applica-tions and cryptographic services demands high performance VM execution engine. The above necessitates the optimization of the JVM for Java Cards

    Optimizations for a Java Interpreter Using Instruction Set Enhancement

    Get PDF
    Several methods for optimizing Java interpreters have been proposed that involve augmenting the existing instruction set. In this paper we describe the design and implementation of three such optimizations for an efficient Java interpreter. Specialized instructions are new versions of existing instructions with commonly occurring operands hardwired into them, which reduces operand fetching. Superinstructions are new Java instructions which perform the work of common sequences of instructions. Finally, instruction replication is the duplication of existing instructions with a view to improving branch prediction accuracy. We describe our basic interpreter, the interpreter generator we use to automatically create optimised source code for enhanced instructions, and discuss Java specific issues relating to these optimizations. Experimental results show significant speedups (up to a factor of 3.3, and realistic average speedups of 30%-35%) are attainable using these techniques

    Optimization of Storage-Referencing Gestures

    Get PDF
    We describe techniques for identifying and optimizing memory-accessing instruction sequences. We capture a sequence of such instructions, with the goal of sending the sequence as a single instruction from the CPU to a smart memory subsystem (IRAM or PIM). With a software/hardware codesign, the memory-accessing gestures can be rewritten as succinct superoperator instructions, and the gestures themselves could vary at runtime. As a result, the CPU executes fewer instructions and the CPU-memory bus is charged less often, resulting in lower power consumption. Reduction in power can be crucial for constrained, embedded systems. We discover gestures using a static and a dynamic approach, and we present data showing the presence of such gestures in real benchmarks (Java and C). We have shown the gesture-minimization problem to be NP-Complete, so we offer in this paper a heuristic approach the effectiveness of which we evaluate with experiments

    Speculative Staging for Interpreter Optimization

    Full text link
    Interpreters have a bad reputation for having lower performance than just-in-time compilers. We present a new way of building high performance interpreters that is particularly effective for executing dynamically typed programming languages. The key idea is to combine speculative staging of optimized interpreter instructions with a novel technique of incrementally and iteratively concerting them at run-time. This paper introduces the concepts behind deriving optimized instructions from existing interpreter instructions---incrementally peeling off layers of complexity. When compiling the interpreter, these optimized derivatives will be compiled along with the original interpreter instructions. Therefore, our technique is portable by construction since it leverages the existing compiler's backend. At run-time we use instruction substitution from the interpreter's original and expensive instructions to optimized instruction derivatives to speed up execution. Our technique unites high performance with the simplicity and portability of interpreters---we report that our optimization makes the CPython interpreter up to more than four times faster, where our interpreter closes the gap between and sometimes even outperforms PyPy's just-in-time compiler.Comment: 16 pages, 4 figures, 3 tables. Uses CPython 3.2.3 and PyPy 1.

    A bytecode set for adaptive optimizations

    Get PDF
    International audienceThe Cog virtual machine features a bytecode interpreter and a baseline Just-in-time compiler. To reach the performance level of industrial quality virtual machines such as Java HotSpot, it needs to employ an adaptive inlining com-piler, a tool that on the fly aggressively optimizes frequently executed portions of code. We decided to implement such a tool as a bytecode to bytecode optimizer, implemented above the virtual machine, where it can be written and developed in Smalltalk. The optimizer we plan needs to extend the operations encoded in the bytecode set and its quality heavily depends on the bytecode set quality. The current bytecode set understood by the virtual machine is old and lacks any room to add new operations. We decided to implement a new bytecode set, which includes additional bytecodes that allow the Just-in-time compiler to generate less generic, and hence simpler and faster code sequences for frequently executed primitives. The new bytecode set includes traps for validating speculative inlining de-cisions and is extensible without compromising optimization opportunities. In addition, we took advantage of this work to solve limitations of the current bytecode set such as the maximum number of instance variable per class, or number of literals per method. In this paper we describe this new byte-code set. We plan to have it in production in the Cog virtual machine and its Pharo, Squeak and Newspeak clients in the coming year

    Interpreter Register Autolocalisation: Improving the performance of efficient interpreters

    Get PDF
    International audienceLanguage interpreters are generally slower than (JIT) compiled implementations because they trade off simplicity for performance and portability. However, they are still an important part of modern Virtual Machines (VMs) as part of mixed-mode execution schema. The reasons behind their importance are many. On the one hand, not all code gets hot and deserves to be optimized by JIT compilers. Examples of cold code are tests, command-line applications, and scripts. On the other hand, compilers are more difficult to write and maintain, thus interpreters are an attractive solution because of their simplicity and portability. In the context of this paper, we will center on bytecode interpreters. Interpreter performance has been a hot topic for a long time, where several solutions have been proposed with different ranges of complexity and portability. On the one hand, some work proposes to optimize language-specific features in interpreters such as type dispatches using static type predictions, quickening [3] or type specializations [18]. On the other hand, many solutions focus on improving general interpreter behavior by minimizing branch miss-predictions of interpreter dispatches and stack caching. Solutions to branch mis-predictions propose variants of code threading [1, 4, 6, 7, 10] and improving it further with selective inlining [14]. Some solutions aim for minimizing branch miss-predictions by modifying the intermediate code (e.g., bytecode) design with super-instructions [15] and register-based instructions [9, 16]. Stack caching [5] proposes to optimize the access of operands by caching the top of the stack. interpreter registers are also related to stack caching: interpreter variables that are critical to the efficient execution of the interpreter loop. Examples of such variables are the instruction pointer (IP), the stack pointer (SP), and the frame pointer (FP). Interpreter registers put pressure on the overall design and implementation of the interpreter: Req1: Value access outside the interpreter loop. VM routines outside of the interpreter loop may require access to interpreter registers. For example, this is the case of garbage collectors that need to traverse the stack to find root objects, routines that unwind or reify the stack, or give access to stack values to native methods. Req2: Efficiency. Interpreter registers are used on each instruction to manipulate the instruction stream and the stack. Under-efficient implementations have negative impacts on performance
    corecore