73 research outputs found

    Optimizations for a Java Interpreter Using Instruction Set Enhancement

    Get PDF
    Several methods for optimizing Java interpreters have been proposed that involve augmenting the existing instruction set. In this paper we describe the design and implementation of three such optimizations for an efficient Java interpreter. Specialized instructions are new versions of existing instructions with commonly occurring operands hardwired into them, which reduces operand fetching. Superinstructions are new Java instructions which perform the work of common sequences of instructions. Finally, instruction replication is the duplication of existing instructions with a view to improving branch prediction accuracy. We describe our basic interpreter, the interpreter generator we use to automatically create optimised source code for enhanced instructions, and discuss Java specific issues relating to these optimizations. Experimental results show significant speedups (up to a factor of 3.3, and realistic average speedups of 30%-35%) are attainable using these techniques

    Repositioning Tiered HotSpot Execution Performance Relative to the Interpreter

    Full text link
    Although the advantages of just-in-time compilation over traditional interpretive execution are widely recognised, there needs to be more current research investigating and repositioning the performance differences between these two execution models relative to contemporary workloads. Specifically, there is a need to examine the performance differences between Java Runtime Environment (JRE) Java Virtual Machine (JVM) tiered execution and JRE JVM interpretive execution relative to modern multicore architectures and modern concurrent and parallel benchmark workloads. This article aims to fill this research gap by presenting the results of a study that compares the performance of these two execution models under load from the Renaissance Benchmark Suite. This research is relevant to anyone interested in understanding the performance differences between just-in-time compiled code and interpretive execution. It provides a contemporary assessment of the interpretive JVM core, the entry and starting point for bytecode execution, relative to just-in-time tiered execution. The study considers factors such as the JRE version, the GNU GCC version used in the JRE build toolchain, and the garbage collector algorithm specified at runtime, and their impact on the performance difference envelope between interpretive and tiered execution. Our findings indicate that tiered execution is considerably more efficient than interpretive execution, and the performance gap has increased, ranging from 4 to 37 times more efficient. On average, tiered execution is approximately 15 times more efficient than interpretive execution. Additionally, the performance differences between interpretive and tiered execution are influenced by workload category, with narrower performance differences observed for web-based workloads and more significant differences for Functional and Scala-type workloads.Comment: 17 page

    Towards Superinstructions for Java Interpreters

    Get PDF
    The Java Virtual Machine (JVM) is usually implemented by an interpreter or just-in-time (JIT) compiler. JITs provide the best performance, but interpreters have a number of advantages that make them attractive, especially for embedded systems. These advantages include simplicity, portability and lower memory requirements. Instruction dispatch is responsible for most of the running time of efficient interpreters, especially on pipelined processors. Superinstructions are an important optimisation to reduce the number of instruction dispatches. A superinstruction is a new Java instruction which performs the work of a common sequence of instructions. In this paper we describe work in progress on the design and implementation of a system of superinstructions for an efficient Java interpreter for connected devices and embedded systems. We describe our basic interpreter, the interpreter generator we use to automatically create optimised source code for superinstructions, and discuss Java specific issues relating to superinstructions. Our initial experimental results show that superinstructions can give large speedups on the SPECjvm98 benchmark suite

    Proxy compilation for Java via a code migration technique

    Get PDF
    There is an increasing trend that intermediate representations (IRs) are used to deliver programs in more and more languages, such as Java. Although Java can provide many advantages, including a wider portability and better optimisation opportunities on execution, it introduces extra overhead by requiring an IR translation for the program execution. For maximum execution performance, an optimising compiler is placed in the runtime to selectively optimise code regions regarded as “hotspots”. This common approach has been effectively deployed in many implementation of programming languages. However, the computational resources demanded by this approach made it less efficient, or even difficult to deploy directly in a resourceconstrained environment. One implementation approach is to use a remote compilation technique to support compilation during the execution. The work presented in this dissertation supports the thesis that execution performance can be improved by the use of efficient optimising compilation by using a proxy dynamic optimising compiler. After surveying various approaches to the design and implementation of remote compilation, a proxy compilation system called Apus is defined. To demonstrate the effectiveness of using a dynamic optimising compiler as a proxy compiler, a complete proxy compilation system is written based on a research-oriented Java VirtualMachine (JVM). The proxy compilation system is discussed in detail, showing how to deliver remote binaries and manage a cache of binaries by using a code migration approach. The proxy compilation client shows how the proxy compilation service is integrated with the selective optimisation system to maximise execution performance. The results of empirical measurements of the system are given, showing the efficiency of code optimisation from either the proxy compilation service and a local binary cache. The conclusion of this work is that Java execution performance can be improved by efficient optimising compilation with a proxy compilation service by using a code migration technique

    High performance annotation-aware JVM for Java cards

    Full text link
    Early applications of smart cards have focused in the area of per-sonal security. Recently, there has been an increasing demand for networked, multi-application cards. In this new scenario, enhanced application-specific on-card Java applets and complex cryptographic services are executed through the smart card Java Virtual Machine (JVM). In order to support such computation-intensive applica-tions, contemporary smart cards are designed with built-in micro-processors and memory. As smart cards are highly area-constrained environments with memory, CPU and peripherals competing for a very small die space, the VM execution engine of choice is often a small, slow interpreter. In addition, support for multiple applica-tions and cryptographic services demands high performance VM execution engine. The above necessitates the optimization of the JVM for Java Cards

    A selective dynamic compiler for embedded Java virtual machine targeting ARM processors

    Get PDF
    Tableau d’honneur de la Faculté des études supérieures et postdoctorales, 2004-2005Ce travail présente une nouvelle technique de compilation dynamique sélective pour les systèmes embarqués avec processeurs ARM. Ce compilateur a été intégré dans la plateforme J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Con- figuration). L’objectif principal de notre travail est d’obtenir une machine virtuelle accélérée, légère et compacte prête pour l’exécution sur les systèmes embarqués. Cela est atteint par l’implémentation d’un compilateur dynamique sélectif pour l’architecture ARM dans la Kilo machine virtuelle de Sun (KVM). Ce compilateur est appelé Armed E-Bunny. Premièrement, on présente la plateforme Java, le Java 2 Micro Edition(J2ME) pour les systèmes embarqués et les composants de la machine virtuelle Java. Ensuite, on discute les différentes techniques d’accélération pour la machine virtuelle Java et on détaille le principe de la compilation dynamique. Enfin, on illustre l’architecture, le design (la conception), l’implémentation et les résultats expérimentaux de notre compilateur dynamique sélective Armed E-Bunny. La version modifiée de KVM a été portée sur un ordinateur de poche (PDA) et a été testée en utilisant un benchmark standard de J2ME. Les résultats expérimentaux de la performance montrent une accélération de 360 % par rapport à la dernière version de la KVM de Sun avec un espace mémoire additionnel qui n’excède pas 119 kilobytes.This work presents a new selective dynamic compilation technique targeting ARM 16/32-bit embedded system processors. This compiler is built inside the J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Configuration) platform. The primary objective of our work is to come up with an efficient, lightweight and low-footprint accelerated Java virtual machine ready to be executed on embedded machines. This is achieved by implementing a selective ARM dynamic compiler called Armed E-Bunny into Sun’s Kilobyte Virtual Machine (KVM). We first present the Java platform, Java 2 Micro Edition (J2ME) for embedded systems and Java virtual machine components. Then, we discuss the different acceleration techniques for Java virtual machine and we detail the principle of dynamic compilation. After that we illustrate the architecture, design, implementation and experimental results of our selective dynamic compiler Armed E-Bunny. The modified KVM is ported on a handheld PDA and is tested using standard J2ME benchmarks. The experimental results on its performance demonstrate that a speedup of 360% over the last version of Sun’s KVM is accomplished with a footprint overhead that does not exceed 119 kilobytes

    An Efficient and Flexible Implementation of Aspect-Oriented Languages

    Get PDF
    Compilers for modern object-oriented programming languages generate code in a platform independent intermediate language preserving the concepts of the source language; for example, classes, fields, methods, and virtual or static dispatch can be directly identified within the intermediate code. To execute this intermediate code, state-of-the-art implementations of virtual machines perform just-in-time (JIT) compilation of the intermediate language; i.e., the virtual instructions in the intermediate code are compiled to native machine code at runtime. In this step, a declarative representation of source language concepts in the intermediate language facilitates highly efficient adaptive and speculative optimization of the running program which may not be possible otherwise. In contrast, constructs of aspect-oriented languages - which improve the separation of concerns - are commonly realized by compiling them to conventional intermediate language instructions or by driving transformations of the intermediate code, which is called weaving. This way the aspect-oriented constructs' semantics is not preserved in a declarative manner at the intermediate language level. This representational gap between aspect-oriented concepts in the source code and in the intermediate code hinders high performance optimizations and weakens features of software engineering processes like debugging support or the continuity property of incremental compilation: modifying an aspect in the source code potentially requires re-weaving multiple other modules. To leverage language implementation techniques for aspect-oriented languages, this thesis proposes the Aspect-Language Implementation Architecture (ALIA) which prescribes - amongst others - the existence of an intermediate representation preserving the aspect-oriented constructs of the source program. A central component of this architecture is an extensible and flexible meta-model of aspect-oriented concepts which acts as an interface between front-ends (usually a compiler) and back-ends (usually a virtual machine) of aspect-oriented language implementations. The architecture and the meta-model are embodied for Java-based aspect-oriented languages in the Framework for Implementing Aspect Languages (FIAL) respectively the Language-Independent Aspect Meta-Model (LIAM) which is part of the framework. FIAL generically implements the work flows required from an execution environment when executing aspects provided in terms of LIAM. In addition to the first-class intermediate representation of aspect-oriented concepts, ALIA - and the FIAL framework as its incarnation - treat the points of interaction between aspects and other modules - so-called join points - as being late-bound to an implementation. In analogy to the object-oriented terminology for late-bound methods, the join points are called virtual in ALIA. Together, the first-class representation of aspect-oriented concepts in the intermediate representation as well as treating join points as being virtual facilitate the implementation of new and effective optimizations for aspect-oriented programs. Three different instantiations of the FIAL framework are presented in this thesis, showcasing the feasibility of integrating language back-ends with different characteristics with the framework. One integration supports static aspect deployment and produces results similar to conventional aspect weavers; the woven code is executable on any standard Java virtual machine. Two instantiations are fully dynamic, where one is realized as a portable plug-in for standard Java virtual machines and the other one, called Steamloom^ALIA , is realized as a deep integration into a specific virtual machine, the Jikes Research Virtual Machine Alpern2005. While the latter instantiation is not portable, it exhibits an outstanding performance. Virtual join point dispatch is a generalization of virtual method dispatch. Thus, well established and elaborate optimization techniques from the field of virtual method dispatch are re-used with slight adaptations in Steamloom^ALIA . These optimizations for aspect-oriented concepts go beyond the generation of optimal bytecode. Especially strikingly, the power of such optimizations is shown in this thesis by the examples of the cflow dynamic property, which may be necessary to evaluate during virtual join point dispatch, and dynamic aspect deployment - i.e., the selective modification of specific join points' dispatch. In order to evaluate the optimization techniques developed in this thesis, a means for benchmarking has been developed in terms of macro-benchmarks; i.e., real-world applications are executed. These benchmarks show that for both concepts the implementation presented here is at least circa twice as fast as state-of-the-art implementations performing static optimizations of the generated bytecode; in many cases this thesis's optimizations even reach a speed-up of two orders of magnitude for the cflow implementation and even four orders of magnitude for the dynamic deployment. The intermediate representation in terms of LIAM models is general enough to express the constructs of multiple aspect-oriented languages. Therefore, optimizations of features common to different languages are available to applications written in all of them. To proof that the abstractions provided by LIAM are sufficient to act as intermediate language for multiple aspect-oriented source languages, an automated translation from source code to LIAM models has been realized for three very different and popular aspect-oriented languages: AspectJ, JAsCo and Compose*. In addition, the feasibility of translating from CaesarJ to LIAM models is shown by discussion. The use of an extensible meta-model as intermediate representation furthermore simplifies the definition of new aspect-oriented language concepts as is shown in terms of a tutorial-style example of designing a domain specific extension to the Java language in this thesis

    Towards an embedded real-time Java virtual machine

    Get PDF
    Most computers today are embedded, i.e. they are built into some products or system that is not perceived as a computer. It is highly desirable to use modern safe object-oriented software techniques for a rapid development of reliable systems. However, languages and run-time platforms for embedded systems have not kept up with the front line of language development. Reasons include complex and, in some cases, contradictory requirements on timing, concurrency, predictability, safety, and flexibility. A carefully tailored Java virtual machine (called IVM) is proposed as an approach to overcome these difficulties. In particular, real-time garbage collection has been considered an essential part. The set of bytecodes has been revised to require less memory and to facilitate predictable execution. To further reduce the memory footprint, the class loader can be located outside the embedded processor. Since the accomplished concurrency is crucial for the function of many embedded applications, the scheduling can be defined on the application level in Java. Finally considering future needs for flexibility and on-line configuration of embedded system, the IVM has a unique structure with which, for instance, methods being objects that can be replaced and GCed. The approach has been experimentally verified by a full prototype implementation of such a virtual machine. By making the prototype available for development of real products, this in turn has confronted the solutions with real industrial demands. It was found that the IVM can be easily integrated in typical systems today and the mentioned requirements are fulfilled. Based on experiences from more than 10 projects utilising the novel Java-oriented techniques, there are reasons to believe that the proposed approach is very promising for future flexible embedded systems

    Dynamic Binary Translation for Embedded Systems with Scratchpad Memory

    Get PDF
    Embedded software development has recently changed with advances in computing. Rather than fully co-designing software and hardware to perform a relatively simple task, nowadays embedded and mobile devices are designed as a platform where multiple applications can be run, new applications can be added, and existing applications can be updated. In this scenario, traditional constraints in embedded systems design (i.e., performance, memory and energy consumption and real-time guarantees) are more difficult to address. New concerns (e.g., security) have become important and increase software complexity as well. In general-purpose systems, Dynamic Binary Translation (DBT) has been used to address these issues with services such as Just-In-Time (JIT) compilation, dynamic optimization, virtualization, power management and code security. In embedded systems, however, DBT is not usually employed due to performance, memory and power overhead. This dissertation presents StrataX, a low-overhead DBT framework for embedded systems. StrataX addresses the challenges faced by DBT in embedded systems using novel techniques. To reduce DBT overhead, StrataX loads code from NAND-Flash storage and translates it into a Scratchpad Memory (SPM), a software-managed on-chip SRAM with limited capacity. SPM has similar access latency as a hardware cache, but consumes less power and chip area. StrataX manages SPM as a software instruction cache, and employs victim compression and pinning to reduce retranslation cost and capture frequently executed code in the SPM. To prevent performance loss due to excessive code expansion, StrataX minimizes the amount of code inserted by DBT to maintain control of program execution. When a hardware instruction cache is available, StrataX dynamically partitions translated code among the SPM and main memory. With these techniques, StrataX has low performance overhead relative to native execution for MiBench programs. Further, it simplifies embedded software and hardware design by operating transparently to applications without any special hardware support. StrataX achieves sufficiently low overhead to make it feasible to use DBT in embedded systems to address important design goals and requirements

    Increasing the Performance and Predictability of the Code Execution on an Embedded Java Platform

    Get PDF
    This thesis explores the execution of object-oriented code on an embedded Java platform. It presents established and derives new approaches for the implementation of high-level object-oriented functionality and commonly expected system services. The goal of the developed techniques is the provision of the architectural base for an efficient and predictable code execution. The research vehicle of this thesis is the Java-programmed SHAP platform. It consists of its platform tool chain and the highly-customizable SHAP bytecode processor. SHAP offers a fully operational embedded CLDC environment, in which the proposed techniques have been implemented, verified, and evaluated. Two strands are followed to achieve the goal of this thesis. First of all, the sequential execution of bytecode is optimized through a joint effort of an optimizing offline linker and an on-chip application loader. Additionally, SHAP pioneers a reference coloring mechanism, which enables a constant-time interface method dispatch that need not be backed a large sparse dispatch table. Secondly, this thesis explores the implementation of essential system services within designated concurrent hardware modules. This effort is necessary to decouple the computational progress of the user application from the interference induced by time-sharing software implementations of these services. The concrete contributions comprise a spill-free, on-chip stack; a predictable method cache; and a concurrent garbage collection. Each approached means is described and evaluated after the relevant state of the art has been reviewed. This review is not limited to preceding small embedded approaches but also includes techniques that have proven successful on larger-scale platforms. The other way around, the chances that these platforms may benefit from the techniques developed for SHAP are discussed
    corecore