1,473 research outputs found

    A Quantitative Evaluation of the Contribution of Native Code to Java Workloads

    Get PDF
    Many performance analysis tools for Java focus on tracking executed bytecodes, but provide little support in determining the specific contribution of native code libraries. This paper introduces and assesses a portable approach for characterizing the amount of native code executed by Java applications. A profiling agent based on the JVM Tool Interface (JVMTI) accurately keeps track of all runtime transitions between bytecode and native code. It relies on a combination of JVMTI events, Java Native Interface (JNI) function interception, bytecode instrumentation, and hardware performance counters

    Workload characterization of JVM languages

    Get PDF
    Being developed with a single language in mind, namely Java, the Java Virtual Machine (JVM) nowadays is targeted by numerous programming languages. Automatic memory management, Just-In-Time (JIT) compilation, and adaptive optimizations provided by the JVM make it an attractive target for different language implementations. Even though being targeted by so many languages, the JVM has been tuned with respect to characteristics of Java programs only -- different heuristics for the garbage collector or compiler optimizations are focused more on Java programs. In this dissertation, we aim at contributing to the understanding of the workloads imposed on the JVM by both dynamically-typed and statically-typed JVM languages. We introduce a new set of dynamic metrics and an easy-to-use toolchain for collecting the latter. We apply our toolchain to applications written in six JVM languages -- Java, Scala, Clojure, Jython, JRuby, and JavaScript. We identify differences and commonalities between the examined languages and discuss their implications. Moreover, we have a close look at one of the most efficient compiler optimizations - method inlining. We present the decision tree of the HotSpot JVM's JIT compiler and analyze how well the JVM performs in inlining the workloads written in different JVM languages

    Towards Loosely-Coupled Programming on Petascale Systems

    Full text link
    We have extended the Falkon lightweight task execution framework to make loosely coupled programming on petascale systems a practical and useful programming model. This work studies and measures the performance factors involved in applying this approach to enable the use of petascale systems by a broader user community, and with greater ease. Our work enables the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications. This approach allows a new-and potentially far larger-class of applications to leverage petascale systems, such as the IBM Blue Gene/P supercomputer. We present the challenges of I/O performance encountered in making this model practical, and show results using both microbenchmarks and real applications from two domains: economic energy modeling and molecular dynamics. Our benchmarks show that we can scale up to 160K processor-cores with high efficiency, and can achieve sustained execution rates of thousands of tasks per second.Comment: IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SuperComputing/SC) 200

    Quantifying and Predicting the Influence of Execution Platform on Software Component Performance

    Get PDF
    The performance of software components depends on several factors, including the execution platform on which the software components run. To simplify cross-platform performance prediction in relocation and sizing scenarios, a novel approach is introduced in this thesis which separates the application performance profile from the platform performance profile. The approach is evaluated using transparent instrumentation of Java applications and with automated benchmarks for Java Virtual Machines

    Dependability Metrics : Research Workshop Proceedings

    Full text link
    Justifying reliance in computer systems is based on some form of evidence about such systems. This in turn implies the existence of scientific techniques to derive such evidence from given systems or predict such evidence of systems. In a general sense, these techniques imply a form of measurement. The workshop Dependability Metrics'', which was held on November 10, 2008, at the University of Mannheim, dealt with all aspects of measuring dependability

    Three pitfalls in Java performance evaluation

    Get PDF
    The Java programming language has known a remarkable growth over the last decade. This is partially due to the infrastructure required to run Java ap- plications on general purpose microprocessors: a Java virtual machine (VM). The VM ensures that Java applications are portable across different hardware platforms, because it shelters the applications from the underlying system. Hence the motto write once, run (almost) anywhere. Java applications are compiled to an intermediate form, called bytecode, and consist of a number of so-called class files. The virtual machine takes care of class loading, interpreting or compiling the bytecode to the native code of the underlying hardware platform, thread scheduling, garbage collection, etc. As such, during the execution of a Java application, the VM regularly intervenes to take care of housekeeping tasks and to optimise the application as it is executing. Furthermore, the specific implementation details of most virtual machines insert non-deterministic behaviour, not into the semantic part of the execution, but rather into the lower level execution. For example, to bring a Java application up to competitive speed with classical compiled programs written in languages such as C, the virtual machine needs to optimise Java bytecode. To limit the execution overhead, most virtual machines use a time sampling mechanism to determine the hot methods in the application. This introduces non-determinism, as over several runs, the methods are not always optimised at the same moment, nor is the set of optimised methods always the same. Other factors that introduce non-determinism are the thread scheduling, garbage collection, etc. It is readily seen that performance analysis of Java applications is not as simple as it seems at first, and warrants closer inspection. In this dissertation we are mainly interested in the behaviour of Java applications and their performance. In the course of this work, we uncovered three major pitfalls that were not taken into account by researchers when analysing Java performance prior to this work. We will briefly summarise the main achievements presented in this dissertation. The first pitfall we present involves the interaction between the virtual machine, the application and the input to the application. The performance for short running applications is shown to be mainly determined by the virtual machine. For longer running applications, this influence decreases, but remains tangible. We use statistical analysis, such as principal components analysis and cluster analysis (K-means and hierarchical clustering) to demonstrate and clarify the pitfall. By means of a large number of performance char- acteristics measured using hardware performance counters, five virtual machines and fourteen benchmarks with both a small and a large input size, we demonstrate that short running workloads are primarily clustered by virtual machines. Even for long running applications from the SPECjvm98 benchmark suite, the virtual machine still exerts a large influence on the observed behaviour at the microarchitectural level. This work has shown the need for both larger and longer running benchmarks than were available prior to it – this was (partially) met by the introduction of the DaCapo benchmark suite – as well as a careful consideration when setting up an experiment to avoid measuring the virtual machine, rather than the benchmark. Prior to this work, people were quite often using simulation with short running applications (to save time) for exploring Java performance. The second pitfall we uncover involves the analysis of performance numbers. During a survey of 50 papers published at premier conferences, such as OOPSLA, PLDI, CGO, ISMM and VEE, over the past seven years, we found that a variety of approaches are used, both for experimental design – for example, the input size, virtual machines, heap sizes, etc. – and, even more importantly, for data analysis – for example, using a best out of 3 performance number. New techniques are pitted against existing work using these prevalent approaches, and conclusions regarding their successfulness in beating prior state-of-the-art are based upon them. Given the fact that the execution of Java applications usually involves non-determinism in the virtual machine – for example, when determining which methods to optimise – it should come as no surprise that the lack of statistical rigour in these prevalent approaches leads to misleading or even incorrect conclusions. By this we mean that the conclusions are either not representative of what actually happens, or even contradict reality, as modelled in a statistical manner. To circumvent this pitfall, we propose a rigorous statistical approach that uses confidence intervals to both report and compare performance numbers. We also claim that sufficient experiments should be conducted to get a reliable performance measure. The non-determinism caused by the timer-based optimisation component in a virtual machine can be eliminated using so-called replay compilation. This technique will record a compilation plan during a first execution or profiling run of the application. During a second execution, the application is iterated twice: once to compile and optimise all methods found in the compilation plan, and a second time to perform the actual measurement. It turns out however that current practice of using either a single plan – corresponding to the best performing profiling run – or a combined plan choosing the methods that were optimised in, say, more than half the profiling runs, is no match for using multiple plans. The variability observed in the plans themselves is too large to capture in one of the current practices. Consequently, using multiple plans is definitely the better option. Moreover, this allows using a matched-pair approach in the data analysis, which results in tighter confidence intervals for the mean performance number. The third pitfall we examine is the usage of global performance numbers when tuning either an application or a virtual machine. We show that Java applications exhibit phase behaviour at the method level. This means that instances of the same method show more similarity to each other, behaviourwise, than to instances of other methods. A phase can then be identified as a set of sub-trees of the dynamic call-tree, with each sub-tree headed by the same method. We present an two-step algorithm that allows correlating hardware performance counter data in step 2 with the phases determined in step 1. The information obtained can be applied to show the programmer which methods perform worse than average, for example with respect to the number of cache misses they incur. In the dissertation, we pay particular attention to statistical rigour. For each pitfall, we use statistics to demonstrate its presence. Hopefully this work will encourage other researchers to use more rigour in their work as well

    The embedded Java benchmark suite JemBench

    Get PDF

    GiViP: A Visual Profiler for Distributed Graph Processing Systems

    Full text link
    Analyzing large-scale graphs provides valuable insights in different application scenarios. While many graph processing systems working on top of distributed infrastructures have been proposed to deal with big graphs, the tasks of profiling and debugging their massive computations remain time consuming and error-prone. This paper presents GiViP, a visual profiler for distributed graph processing systems based on a Pregel-like computation model. GiViP captures the huge amount of messages exchanged throughout a computation and provides an interactive user interface for the visual analysis of the collected data. We show how to take advantage of GiViP to detect anomalies related to the computation and to the infrastructure, such as slow computing units and anomalous message patterns.Comment: Appears in the Proceedings of the 25th International Symposium on Graph Drawing and Network Visualization (GD 2017
    • …
    corecore