952 research outputs found

    Performance comparison between Java and JNI for optimal implementation of computational micro-kernels

    Get PDF
    General purpose CPUs used in high performance computing (HPC) support a vector instruction set and an out-of-order engine dedicated to increase the instruction level parallelism. Hence, related optimizations are currently critical to improve the performance of applications requiring numerical computation. Moreover, the use of a Java run-time environment such as the HotSpot Java Virtual Machine (JVM) in high performance computing is a promising alternative. It benefits from its programming flexibility, productivity and the performance is ensured by the Just-In-Time (JIT) compiler. Though, the JIT compiler suffers from two main drawbacks. First, the JIT is a black box for developers. We have no control over the generated code nor any feedback from its optimization phases like vectorization. Secondly, the time constraint narrows down the degree of optimization compared to static compilers like GCC or LLVM. So, it is compelling to use statically compiled code since it benefits from additional optimization reducing performance bottlenecks. Java enables to call native code from dynamic libraries through the Java Native Interface (JNI). Nevertheless, JNI methods are not inlined and require an additional cost to be invoked compared to Java ones. Therefore, to benefit from better static optimization, this call overhead must be leveraged by the amount of computation performed at each JNI invocation. In this paper we tackle this problem and we propose to do this analysis for a set of micro-kernels. Our goal is to select the most efficient implementation considering the amount of computation defined by the calling context. We also investigate the impact on performance of several different optimization schemes which are vectorization, out-of-order optimization, data alignment, method inlining and the use of native memory for JNI methods.Comment: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347

    Dynamic Information Flow Analysis in Ruby

    Get PDF
    With the rapid increase in usage of the internet and online applications, there is a huge demand for applications to handle data privacy and integrity. Applications are already complex with business logic; adding the data safety logic would make them more complicated. The more complex the code becomes, the more possibilities it opens for security-critical bugs. To solve this conundrum, we can push this data safety handling feature to the language level rather than the application level. With a secure language, developers can write their application without having to worry about data security. This project introduces dynamic information flow analysis in Ruby. I extend the JRuby implementation, which is a widely used implementation of Ruby written in Java. Information flow analysis classifies variables used in the program into different security levels and monitors the data flow across levels. Ruby currently supports data integrity by a tainting mechanism. This project extends this tainting mechanism to handle implicit data flows, enabling it to protect confidentiality as well as integrity. Experimental results based on Ruby benchmarks are presented in this paper, which show that: This project protects confidentiality but at the cost of 1.2 - 10 times slowdown in execution time

    Exploring Dynamic Compilation and Cross-Layer Object Management Policies for Managed Language Applications

    Get PDF
    Recent years have witnessed the widespread adoption of managed programming languages that are designed to execute on virtual machines. Virtual machine architectures provide several powerful software engineering advantages over statically compiled binaries, such as portable program representations, additional safety guarantees, automatic memory and thread management, and dynamic program composition, which have largely driven their success. To support and facilitate the use of these features, virtual machines implement a number of services that adaptively manage and optimize application behavior during execution. Such runtime services often require tradeoffs between efficiency and effectiveness, and different policies can have major implications on the system's performance and energy requirements. In this work, we extensively explore policies for the two runtime services that are most important for achieving performance and energy efficiency: dynamic (or Just-In-Time (JIT)) compilation and memory management. First, we examine the properties of single-tier and multi-tier JIT compilation policies in order to find strategies that realize the best program performance for existing and future machines. Our analysis performs hundreds of experiments with different compiler aggressiveness and optimization levels to evaluate the performance impact of varying if and when methods are compiled. We later investigate the issue of how to optimize program regions to maximize performance in JIT compilation environments. For this study, we conduct a thorough analysis of the behavior of optimization phases in our dynamic compiler, and construct a custom experimental framework to determine the performance limits of phase selection during dynamic compilation. Next, we explore innovative memory management strategies to improve energy efficiency in the memory subsystem. We propose and develop a novel cross-layer approach to memory management that integrates information and analysis in the VM with fine-grained management of memory resources in the operating system. Using custom as well as standard benchmark workloads, we perform detailed evaluation that demonstrates the energy-saving potential of our approach. We implement and evaluate all of our studies using the industry-standard Oracle HotSpot Java Virtual Machine to ensure that our conclusions are supported by widely-used, state-of-the-art runtime technology

    UN'ANALISI DEI CONTROLLI DI TIPO NELLA JAVA VIRTUAL MACHINE

    Get PDF
    Un programma scritto in linguaggio Java viene eseguito da una macchina virtuale denominata Java Virtual Machine. Lo sviluppo di questo linguaggio negli anni, soprattutto nel settore del Web, ha imposto dei rigidi modelli di sicurezza, uno dei quali è la verifica del codice bytecode del programma. Il costrutto Interfaccia del linguaggio Java propone delle problematiche in materia di verifica del bytecode, in quanto offre al programmatore uno strumento alternativo all'eredità multipla che in Java non è permessa. Alcuni dei controlli di tipo da effettuare sulle interfacce, non possono essere eseguiti dal verificatore e devono essere rimandati a tempo di esecuzione ed effettuati dall'interprete Java. Questa soluzione però produce un qualche rallentamento nelle prestazioni dell'interprete. Lo scopo di questa Tesi è quello di identificare i controlli a tempo di esecuzione eseguiti sulle interfacce, eliminarli ed eseguire una serie di test per quantificare l'eventuale guadagno in termini di prestazioni che è possibile ottenere

    An Empirical Study on Deoptimization in the Graal Compiler

    Get PDF
    Managed language platforms such as the Java Virtual Machine or the Common Language Runtime rely on a dynamic compiler to achieve high performance. Besides making optimization decisions based on the actual program execution and the underlying hardware platform, a dynamic compiler is also in an ideal position to perform speculative optimizations. However, these tend to increase the compilation costs, because unsuccessful speculations trigger deoptimization and recompilation of the affected parts of the program, wasting previous work. Even though speculative optimizations are widely used, the costs of these optimizations in terms of extra compilation work has not been previously studied. In this paper, we analyze the behavior of the Graal dynamic compiler integrated in Oracle\u27s HotSpot Virtual Machine. We focus on situations which cause program execution to switch from machine code to the interpreter, and compare application performance using three different deoptimization strategies which influence the amount of extra compilation work done by Graal. Using an adaptive deoptimization strategy, we managed to improve the average start-up performance of benchmarks from the DaCapo, ScalaBench, and Octane benchmark suites, mostly by avoiding wasted compilation work. On a single-core system, we observed an average speed-up of 6.4% for the DaCapo and ScalaBench workloads, and a speed-up of 5.1% for the Octane workloads; the improvement decreases with an increasing number of available CPU cores. We also find that the choice of a deoptimization strategy has negligible impact on steady-state performance. This indicates that the cost of speculation matters mainly during start-up, where it can disturb the delicate balance between executing the program and the compiler, but is quickly amortized in steady state

    Towards co-designed optimizations in parallel frameworks: A MapReduce case study

    Full text link
    The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to maintain and delegate vital semantic information between layers in the stack. Software abstractions increase the semantic distance between an application and its generated code. However, parallel software frameworks contain inherent semantic information that general purpose compilers are not designed to exploit. This paper presents a case study demonstrating how the specific semantic information of the MapReduce paradigm can be exploited on multicore architectures. MR4J has been implemented in Java and evaluated against hand-optimized C and C++ equivalents. The initial observed results led to the design of a semantically aware optimizer that runs automatically without requiring modification to application code. The optimizer is able to speedup the execution time of MR4J by up to 2.0x. The introduced optimization not only improves the performance of the generated code, during the map phase, but also reduces the pressure on the garbage collector. This demonstrates how semantic information can be harnessed without sacrificing sound software engineering practices when using parallel software frameworks.Comment: 8 page

    A Test Suite for High-Performance Parallel Java

    Get PDF
    The Java programming language has a number of features that make it attractive for writing high-quality, portable parallel programs. A pure object formulation, strong typing and the exception model make programs easier to create, debug, and maintain. The elegant threading provides a simple route to parallelism on shared-memory machines. Anticipating great improvements in numerical performance, this paper presents a suite of simple programs that indicate how a pure Java Navier-Stokes solver might perform. The suite includes a parallel Euler solver. We present results from a 32-processor Hewlett-Packard machine and a 4-processor Sun server. While speedup is excellent on both machines, indicating a high-quality thread scheduler, the single-processor performance needs much improvement

    Triggering of Just-In-Time Compilation in the Java Virtual Machine

    Get PDF
    The Java Virtual Machine (Standard Edition) normally interprets Java byte code but also compiles Java methods that are frequently interpreted and runs them natively. The purpose is to take advantage of native execution without having too much overhead for Just-In-Time compilation. A former SJSU thesis tried to enhance the standard policy by predicting frequently called methods ahead of their actual frequent interpretation. The project also tried to increase the compilation throughput by prioritizing the method compilations, if there is more than one hot method to compile at the same time. The paper claimed significant speedup. In this project, we tried to re-implement the previous work on a different platform to see if we get the same results. Our re-evaluation showed some speedup for the prediction approach but with some adjustments and only for server applications. It also showed some speedup for the prioritizing approach for all the benchmarks. We also designed two other approaches to enhance the original policy. We tried to reduce the start-up delay that is due to overhead of Just-In-Time compilation by postponing some of Just-In- Time compilation. We also tried to increase the accuracy of detecting frequently interpreted methods that contain nested loops. We could not gain any speedup for our former postponing approach but we could improve the performance using our latter measuring approach
    • …
    corecore