298 research outputs found

    A bytecode set for adaptive optimizations

    Get PDF
    International audienceThe Cog virtual machine features a bytecode interpreter and a baseline Just-in-time compiler. To reach the performance level of industrial quality virtual machines such as Java HotSpot, it needs to employ an adaptive inlining com-piler, a tool that on the fly aggressively optimizes frequently executed portions of code. We decided to implement such a tool as a bytecode to bytecode optimizer, implemented above the virtual machine, where it can be written and developed in Smalltalk. The optimizer we plan needs to extend the operations encoded in the bytecode set and its quality heavily depends on the bytecode set quality. The current bytecode set understood by the virtual machine is old and lacks any room to add new operations. We decided to implement a new bytecode set, which includes additional bytecodes that allow the Just-in-time compiler to generate less generic, and hence simpler and faster code sequences for frequently executed primitives. The new bytecode set includes traps for validating speculative inlining de-cisions and is extensible without compromising optimization opportunities. In addition, we took advantage of this work to solve limitations of the current bytecode set such as the maximum number of instance variable per class, or number of literals per method. In this paper we describe this new byte-code set. We plan to have it in production in the Cog virtual machine and its Pharo, Squeak and Newspeak clients in the coming year

    Towards Superinstructions for Java Interpreters

    Get PDF
    The Java Virtual Machine (JVM) is usually implemented by an interpreter or just-in-time (JIT) compiler. JITs provide the best performance, but interpreters have a number of advantages that make them attractive, especially for embedded systems. These advantages include simplicity, portability and lower memory requirements. Instruction dispatch is responsible for most of the running time of efficient interpreters, especially on pipelined processors. Superinstructions are an important optimisation to reduce the number of instruction dispatches. A superinstruction is a new Java instruction which performs the work of a common sequence of instructions. In this paper we describe work in progress on the design and implementation of a system of superinstructions for an efficient Java interpreter for connected devices and embedded systems. We describe our basic interpreter, the interpreter generator we use to automatically create optimised source code for superinstructions, and discuss Java specific issues relating to superinstructions. Our initial experimental results show that superinstructions can give large speedups on the SPECjvm98 benchmark suite

    Description and Optimization of Abstract Machines in a Dialect of Prolog

    Full text link
    In order to achieve competitive performance, abstract machines for Prolog and related languages end up being large and intricate, and incorporate sophisticated optimizations, both at the design and at the implementation levels. At the same time, efficiency considerations make it necessary to use low-level languages in their implementation. This makes them laborious to code, optimize, and, especially, maintain and extend. Writing the abstract machine (and ancillary code) in a higher-level language can help tame this inherent complexity. We show how the semantics of most basic components of an efficient virtual machine for Prolog can be described using (a variant of) Prolog. These descriptions are then compiled to C and assembled to build a complete bytecode emulator. Thanks to the high level of the language used and its closeness to Prolog, the abstract machine description can be manipulated using standard Prolog compilation and optimization techniques with relative ease. We also show how, by applying program transformations selectively, we obtain abstract machine implementations whose performance can match and even exceed that of state-of-the-art, highly-tuned, hand-crafted emulators.Comment: 56 pages, 46 figures, 5 tables, To appear in Theory and Practice of Logic Programming (TPLP

    High performance annotation-aware JVM for Java cards

    Full text link
    Early applications of smart cards have focused in the area of per-sonal security. Recently, there has been an increasing demand for networked, multi-application cards. In this new scenario, enhanced application-specific on-card Java applets and complex cryptographic services are executed through the smart card Java Virtual Machine (JVM). In order to support such computation-intensive applica-tions, contemporary smart cards are designed with built-in micro-processors and memory. As smart cards are highly area-constrained environments with memory, CPU and peripherals competing for a very small die space, the VM execution engine of choice is often a small, slow interpreter. In addition, support for multiple applica-tions and cryptographic services demands high performance VM execution engine. The above necessitates the optimization of the JVM for Java Cards

    A Co-Processor Approach for Efficient Java Execution in Embedded Systems

    Get PDF
    This thesis deals with a hardware accelerated Java virtual machine, named REALJava. The REALJava virtual machine is targeted for resource constrained embedded systems. The goal is to attain increased computational performance with reduced power consumption. While these objectives are often seen as trade-offs, in this context both of them can be attained simultaneously by using dedicated hardware. The target level of the computational performance of the REALJava virtual machine is initially set to be as fast as the currently available full custom ASIC Java processors. As a secondary goal all of the components of the virtual machine are designed so that the resulting system can be scaled to support multiple co-processor cores. The virtual machine is designed using the hardware/software co-design paradigm. The partitioning between the two domains is flexible, allowing customizations to the resulting system, for instance the floating point support can be omitted from the hardware in order to decrease the size of the co-processor core. The communication between the hardware and the software domains is encapsulated into modules. This allows the REALJava virtual machine to be easily integrated into any system, simply by redesigning the communication modules. Besides the virtual machine and the related co-processor architecture, several performance enhancing techniques are presented. These include techniques related to instruction folding, stack handling, method invocation, constant loading and control in time domain. The REALJava virtual machine is prototyped using three different FPGA platforms. The original pipeline structure is modified to suit the FPGA environment. The performance of the resulting Java virtual machine is evaluated against existing Java solutions in the embedded systems field. The results show that the goals are attained, both in terms of computational performance and power consumption. Especially the computational performance is evaluated thoroughly, and the results show that the REALJava is more than twice as fast as the fastest full custom ASIC Java processor. In addition to standard Java virtual machine benchmarks, several new Java applications are designed to both verify the results and broaden the spectrum of the tests.Siirretty Doriast

    낎임형 시슀템에서 Just-in-Time 및 Ahead-of-Time 컎파음을 활용한 í•˜ìŽëžŒëŠŹë“œ 자바 컎파음

    Get PDF
    í•™ìœ„ë…ŒëŹž (ë°•ì‚Ź)-- 서욞대학ꔐ 대학원 : ì „êž°Â·ì»Ží“ší„°êł”í•™ë¶€, 2015. 8. ëŹžìˆ˜ëŹ”.Many embedded Java software platforms execute two types of Java classes: those installed statically on the client device and those downloaded dynamically from service providers at runtime. For higher performance, it would be desirable to compile static Java classes by ahead-of-time compiler (AOTC) and to handle dynamically downloaded classes by just-in-time compiler (JITC), providing a hybrid compilation environment. We propose a hybrid Java compilation approach and performs an initial case study with a hybrid environment, which is constructed simply by merging an existing AOTC and a JITC for the same JVM. Contrary to our expectations, the hybrid environment does not deliver a performance, in-between of full-JITCs and full-AOTCs. In fact, its performance is even lower than full-JITCs for many benchmarks. We analyzed the result and found that a naive merge of JITC and AOTC may result in inefficiencies, especially due to calls between JITC methods and AOTC methods. We also observed that the distribution of JITC methods and AOTC methods is also important, and experimented with various distributions to understand when a hybrid environment can deliver a desired performance. The Android Java is to be executed by the Dalvik virtual machine (VM), which is quite different from the traditional Java VM such as Oracles HotSpot VM. That is, Dalvik employs register-based bytecode while HotSpot employs stack-based bytecode, requiring a different way of interpretation. Also, Dalvik uses trace-based just-in-time compilation (JITC), while HotSpot uses method-based JITC. Therefore, it is questioned how the Dalvik VM performs compared the HotSpot VM. Unfortunately, there has been little comparative evaluation of both VMs, so the performance of the Dalvik VM is not well understood. More importantly, it is also not well understood how the performance of the Dalvik VM affects the overall performance of the Android applications (apps). We make an attempt to evaluate the Dalvik VM. We install both VMs on the same board and compare the performance using EEMBC benchmark. In the JITC mode, Dakvik is slower than HotSpot by more than 2.9 times and its generated code size is not smaller than HotSpots due to its worse code quality and trace-chaining code. We also investigated how real Android apps are different from Java benchmarks, to understand why the slow Dalvik VM does not affect the performance of the Android apps seriously. We proposes a bytecode-to-C ahead-of-time compilation (AOTC) for the DVM to accelerate pre-installed apps. We translated the bytecode of some of the hot methods used by these apps to C code, which is then compiled together with the DVM source code. AOTC-generated code works with the existing Android zygote mechanism, with correct garbage collection and exception handling. Due to off-line, method-based compilation using existing compiler with full optimizations and Java-specific optimizations, AOTC can generate quality code while obviating runtime compilation overhead. For benchmarks, AOTC can improve the performance by 65%. When we compare with the recently-introduced ART, which also performs ahead-of-time compilation, our AOTC performs better. We cannot AOTC all middleware and framework methods in DTV and android device for hybrid compilation. By case study on DTV, we found that we need to adopt AOTC enough methods and reduce method call overhead. We propose AOTC method selection heuristic using method call chain. We select hot methods and call chain methods using profile data. Our heuristic based on method call chain get better performance than other heuristics.Chapter 1 Introduction 1 1.1 The need of hybrid compilation 1 1.2 Outline of the Dissertation 2 Chapter 2 Hybrid Compilation for Java Virtual Machine 3 2.1 The Approach of Hybrid Compilation 3 2.2 The JITC and AOTC 6 2.2.1 JVM and the Interpreter 7 2.2.2 The JITC 8 2.2.3 The AOTC 9 2.3 Hybrid Compilation Environment 11 2.4 Analysis of the Hybrid Environment 14 2.4.1 Call Behavior of Benchmarks 14 2.4.2 Call Overhead 15 2.4.3 Application Methods and Library Methods 18 2.4.4 Improving hybrid performance 20 2.4.4.1 Reducing the JITC-to-AOTC call overhead 20 2.4.4.2 Performance impact of the distribution of JITC methods and AOTC methods 21 Chapter 3 Evaluation of Dalvik Virtual Machine 23 3.1 Android Platform 23 3.2 Java VM and Dalvik VM 24 3.2.1 Bytecode ISA 25 3.2.3 Just-in-Time Compilation (JITC) 27 3.3 Experimental Results 32 3.3.1 Experimental Environment 33 3.3.2 Interpreter Performance 34 3.3.3 JITC Performance 37 3.3.4 Trace Extension 43 3.4 Behavior of Real Android Apps 44 Chapter 4 Ahead-of-Time Compilation for Dalvik Virtual Machine 51 4.1 Android and Dalvik VM Execution 51 4.1.1 Android Execution Model 51 4.1.2 Dalvik VM 51 4.1.3 Dexopt and JITC in the Dalvik VM 53 4.2 AOTC Architecture 54 4.3 Design and Implementation of AOTC 56 4.3.1 Dexopt and Code Generation 56 4.3.2 C Code Generation 56 4.3.3 AOTC Method Call 58 4.3.4 Garbage Collection 61 4.3.5 Exception Handling 62 4.3.6 AOTC Method Linking 63 4.4 AOTC Code Optimization 64 4.4.1 Method Inlining 64 4.4.2 Spill Optimization 64 4.4.3 Elimination of Redundant Code 65 4.5 Experimental Result 65 4.5.1 Experimental Environment 66 4.5.2 AOTC Target Methods 66 4.5.3 Performance Impact of AOTC 67 4.5.4 DVM AOTC vs. ART 68 Chapter 5 Selecting Ahead-of-Time Compilation Target Methods for Hybrid Compilation 70 5.1 Hybrid Compilation on DTV 70 5.2 Hybrid Compilation on Android Device 72 5.3 AOTC for Hybrid Compilation 74 5.3.1 AOTC Target Methods 74 5.3.2 Case Study: Selecting on DTV 75 5.4 Method Selection Using Call Chain 77 5.5 Experimental Result 77 5.5.1 Experimental Environment 78 5.5.2 Performance Impact 79 Chapter 6 Related Works 81 Chapter 7 Conclusion 84 Bibliography 86Docto

    JVM-based Techniques for Improving Java Observability

    Get PDF
    Observability measures the support of computer systems to accurately capture, analyze, and present (collectively observe) the internal information about the systems. Observability frameworks play important roles for program understanding, troubleshooting, performance diagnosis, and optimizations. However, traditional solutions are either expensive or coarse-grained, consequently compromising their utility in accommodating today’s increasingly complex software systems. New solutions are emerging for VM-based languages due to the full control language VMs have over program executions. Existing such solutions, nonetheless, still lack flexibility, have high overhead, or provide limited context information for developing powerful dynamic analyses. In this thesis, we present a VM-based infrastructure, called marker tracing framework (MTF), to address the deficiencies in the existing solutions for providing better observability for VM-based languages. MTF serves as a solid foundation for implementing fine-grained low-overhead program instrumentation. Specifically, MTF allows analysis clients to: 1) define custom events with rich semantics ; 2) specify precisely the program locations where the events should trigger; and 3) adaptively enable/disable the instrumentation at runtime. In addition, MTF-based analysis clients are more powerful by having access to all information available to the VM. To demonstrate the utility and effectiveness of MTF, we present two analysis clients: 1) dynamic typestate analysis with adaptive online program analysis (AOPA); and 2) selective probabilistic calling context analysis (SPCC). In addition, we evaluate the runtime performance of MTF and the typestate client with the DaCapo benchmarks. The results show that: 1) MTF has acceptable runtime overhead when tracing moderate numbers of marker events; and 2) AOPA is highly effective in reducing the event frequency for the dynamic typestate analysis; and 3) language VMs can be exploited to offer greater observability

    A selective dynamic compiler for embedded Java virtual machine targeting ARM processors

    Get PDF
    Tableau d’honneur de la FacultĂ© des Ă©tudes supĂ©rieures et postdoctorales, 2004-2005Ce travail prĂ©sente une nouvelle technique de compilation dynamique sĂ©lective pour les systĂšmes embarquĂ©s avec processeurs ARM. Ce compilateur a Ă©tĂ© intĂ©grĂ© dans la plateforme J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Con- figuration). L’objectif principal de notre travail est d’obtenir une machine virtuelle accĂ©lĂ©rĂ©e, lĂ©gĂšre et compacte prĂȘte pour l’exĂ©cution sur les systĂšmes embarquĂ©s. Cela est atteint par l’implĂ©mentation d’un compilateur dynamique sĂ©lectif pour l’architecture ARM dans la Kilo machine virtuelle de Sun (KVM). Ce compilateur est appelĂ© Armed E-Bunny. PremiĂšrement, on prĂ©sente la plateforme Java, le Java 2 Micro Edition(J2ME) pour les systĂšmes embarquĂ©s et les composants de la machine virtuelle Java. Ensuite, on discute les diffĂ©rentes techniques d’accĂ©lĂ©ration pour la machine virtuelle Java et on dĂ©taille le principe de la compilation dynamique. Enfin, on illustre l’architecture, le design (la conception), l’implĂ©mentation et les rĂ©sultats expĂ©rimentaux de notre compilateur dynamique sĂ©lective Armed E-Bunny. La version modifiĂ©e de KVM a Ă©tĂ© portĂ©e sur un ordinateur de poche (PDA) et a Ă©tĂ© testĂ©e en utilisant un benchmark standard de J2ME. Les rĂ©sultats expĂ©rimentaux de la performance montrent une accĂ©lĂ©ration de 360 % par rapport Ă  la derniĂšre version de la KVM de Sun avec un espace mĂ©moire additionnel qui n’excĂšde pas 119 kilobytes.This work presents a new selective dynamic compilation technique targeting ARM 16/32-bit embedded system processors. This compiler is built inside the J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Configuration) platform. The primary objective of our work is to come up with an efficient, lightweight and low-footprint accelerated Java virtual machine ready to be executed on embedded machines. This is achieved by implementing a selective ARM dynamic compiler called Armed E-Bunny into Sun’s Kilobyte Virtual Machine (KVM). We first present the Java platform, Java 2 Micro Edition (J2ME) for embedded systems and Java virtual machine components. Then, we discuss the different acceleration techniques for Java virtual machine and we detail the principle of dynamic compilation. After that we illustrate the architecture, design, implementation and experimental results of our selective dynamic compiler Armed E-Bunny. The modified KVM is ported on a handheld PDA and is tested using standard J2ME benchmarks. The experimental results on its performance demonstrate that a speedup of 360% over the last version of Sun’s KVM is accomplished with a footprint overhead that does not exceed 119 kilobytes
    • 

    corecore