75 research outputs found

    Array bounds check elimination in the context of deoptimization

    Get PDF
    AbstractWhenever an array element is accessed, Java virtual machines execute a compare instruction to ensure that the index value is within the valid bounds. This reduces the execution speed of Java programs. Array bounds check elimination identifies situations in which such checks are redundant and can be removed. We present an array bounds check elimination algorithm for the Java HotSpot™ VM based on static analysis in the just-in-time compiler.The algorithm works on an intermediate representation in static single assignment form and maintains conditions for index expressions. It fully removes bounds checks if it can be proven that they never fail. Whenever possible, it moves bounds checks out of loops. The static number of checks remains the same, but a check inside a loop is likely to be executed more often. If such a check fails, the executing program falls back to interpreted mode, avoiding the problem that an exception is thrown at the wrong place.The evaluation shows a speedup near to the theoretical maximum for the scientific SciMark benchmark suite and also significant improvements for some Java Grande benchmarks. The algorithm slightly increases the execution speed for the SPECjvm98 benchmark suite. The evaluation of the DaCapo benchmarks shows that array bounds checks do not have a significant impact on the performance of object-oriented applications

    Performance comparison between Java and JNI for optimal implementation of computational micro-kernels

    Get PDF
    General purpose CPUs used in high performance computing (HPC) support a vector instruction set and an out-of-order engine dedicated to increase the instruction level parallelism. Hence, related optimizations are currently critical to improve the performance of applications requiring numerical computation. Moreover, the use of a Java run-time environment such as the HotSpot Java Virtual Machine (JVM) in high performance computing is a promising alternative. It benefits from its programming flexibility, productivity and the performance is ensured by the Just-In-Time (JIT) compiler. Though, the JIT compiler suffers from two main drawbacks. First, the JIT is a black box for developers. We have no control over the generated code nor any feedback from its optimization phases like vectorization. Secondly, the time constraint narrows down the degree of optimization compared to static compilers like GCC or LLVM. So, it is compelling to use statically compiled code since it benefits from additional optimization reducing performance bottlenecks. Java enables to call native code from dynamic libraries through the Java Native Interface (JNI). Nevertheless, JNI methods are not inlined and require an additional cost to be invoked compared to Java ones. Therefore, to benefit from better static optimization, this call overhead must be leveraged by the amount of computation performed at each JNI invocation. In this paper we tackle this problem and we propose to do this analysis for a set of micro-kernels. Our goal is to select the most efficient implementation considering the amount of computation defined by the calling context. We also investigate the impact on performance of several different optimization schemes which are vectorization, out-of-order optimization, data alignment, method inlining and the use of native memory for JNI methods.Comment: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347

    JooFlux: Hijacking Java 7 InvokeDynamic To Support Live Code Modifications

    Get PDF
    Changing functional and non-functional software implementation at runtime is useful and even sometimes critical both in development and production environments. JooFlux is a JVM agent that allows both the dynamic replacement of method implementations and the application of aspect advices. It works by doing bytecode transformation to take advantage of the new invokedynamic instruction added in Java SE 7 to help implementing dynamic languages for the JVM. JooFlux can be managed using a JMX agent so as to operate dynamic modifications at runtime, without resorting to a dedicated domain-specific language. We compared JooFlux with existing AOP platforms and dynamic languages. Results demonstrate that JooFlux performances are close to the Java ones --- with most of the time a marginal overhead, and sometimes a gain --- where AOP platforms and dynamic languages present significant overheads. This paves the way for interesting future evolutions and applications of JooFlux

    Combining Processor Virtualization and Split Compilation for Heterogeneous Multicore Embedded Systems

    Get PDF
    Complex embedded systems have always been heterogeneous multicore systems. Because of the tight constraints on power, performance and cost, this situation is not likely to change any time soon. As a result, the software environments required to program those systems have become very complex too. We propose to apply instruction set virtualization and just-in-time compilation techniques to program heterogeneous multicore embedded systems, with several additional requirements: * the environment must be able to compile legacy C/C++ code to a target independent intermediate representation; * the just-in-time (JIT) compiler must generate high performance code; * the technology must be able to program the whole system, not just the host processor. Advantages that derive from such an environment include, among others, much simpler software engineering, reduced maintenance costs, reduced legacy code problems... It also goes beyond mere binary compatibility by providing a better exploitation of the hardware platform. We also propose to combine processor virtualization with split compilation to improve the performance of the JIT compiler. Taking advantage of the two-step compilation process, we want to make it possible to run very aggressive optimizations online, even on a very constraint system

    ShareJIT: JIT Code Cache Sharing across Processes and Its Practical Implementation

    Get PDF
    Just-in-time (JIT) compilation coupled with code caching are widely used to improve performance in dynamic programming language implementations. These code caches, along with the associated profiling data for the hot code, however, consume significant amounts of memory. Furthermore, they incur extra JIT compilation time for their creation. On Android, the current standard JIT compiler and its code caches are not shared among processes---that is, the runtime system maintains a private code cache, and its associated data, for each runtime process. However, applications running on the same platform tend to share multiple libraries in common. Sharing cached code across multiple applications and multiple processes can lead to a reduction in memory use. It can directly reduce compile time. It can also reduce the cumulative amount of time spent interpreting code. All three of these effects can improve actual runtime performance. In this paper, we describe ShareJIT, a global code cache for JITs that can share code across multiple applications and multiple processes. We implemented ShareJIT in the context of the Android Runtime (ART), a widely used, state-of-the-art system. To increase sharing, our implementation constrains the amount of context that the JIT compiler can use to optimize the code. This exposes a fundamental tradeoff: increased specialization to a single process' context decreases the extent to which the compiled code can be shared. In ShareJIT, we limit some optimization to increase shareability. To evaluate the ShareJIT, we tested 8 popular Android apps in a total of 30 experiments. ShareJIT improved overall performance by 9% on average, while decreasing memory consumption by 16% on average and JIT compilation time by 37% on average.Comment: OOPSLA 201

    Speculation Without Regret: Reducing Deoptimization Meta-data

    Get PDF
    Abstract Speculative optimizations are used in most Just In Time (JIT) compilers in order to take advantage of dynamic runtime feedback. These speculative optimizations usually require the compiler to produce meta-data that the Virtual Machine (VM) can use as fallback when a speculation fails. This meta-data can be large and incurs a significant memory overhead since it needs to be stored alongside the machine code for as long as the machine code lives. The design of the Graal compiler leads to many speculations falling back to a similar state and location. In this paper we present deoptimization grouping, an optimization using this property of the Graal compiler to reduce the amount of meta-data that must be stored by the VM without having to modify the VM. We compare our technique with existing meta-data compression techniques from the HotSpot Virtual Machine and study how well they combine. In order to make informed decisions about speculation meta-data, we present an empirical analysis of the origin, impact and usages of this meta-data

    Applying meta-functions for improving JavaScript code performance

    Get PDF
    In recent years, the expansion of the World Wide Web, and web runtimes in particular, to all kind of devices has rendered the JavaScript performance in a hot topic. Several approaches to improve the performance of JavaScript applications have been tried by the industrial and research communities. In this paper we review the most popular approaches and propose a novel solution based on meta-programming and source code rewriting. The preliminary results of our experiments are very promising, although more studies are required to know to what extent this approach can improve the performance of real-life JavaScript programs.Sociedad Argentina de Informática e Investigación Operativ

    Present but Unreachable: Reducing Persistentlatent Secrets in HotSpot JVM

    Get PDF
    Applications that manage \ sensitive secrets, including cryptographic keys, are typically \ engineered to overwrite the secrets in memory once they\u27re no longer \ necessary, offering an important defense against forensic attacks \ against the computer. In a modern garbage-collected memory system, \ however, live objects will be copied and compacted into new memory \ pages, with the user program being unable to reach and zero out \ obsolete copies in old memory pages that have not yet \ been reused. This paper considers this problem in the HotSpot JVM, \ the default JVM used by the Oracle and OpenJDK Java platforms. \ We analyze the SerialGC and Garbage First Garbage Collector (G1GC) \ implementations, showing that sensitive data such as TLS keys are \ easily extracted from the garbage. To mitigate this issue, we \ implemented techniques to sanitize older heap pages and we measure \ the performance impact--sometimes good, sometimes unacceptable. We \ also discuss how future garbage collectors might be designed from \ scratch with efficient heap sanitation in mind.
    corecore