Search CORE

521 research outputs found

Coz: Finding Code that Counts with Causal Profiling

Author: Ammons G.
Hollingsworth J. K.
Kambadur M.
Kambadur M.
Miller B. P.
Thompson K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/08/2016
Field of study

Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spent their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities. This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates exactly where programmers should focus their optimization efforts, and quantifies their potential impact. Causal profiling works by running performance experiments during program execution. Each experiment calculates the impact of any potential optimization by virtually speeding up code: inserting pauses that slow down all other code running concurrently. The key insight is that this slowdown has the same relative effect as running that line faster, thus "virtually" speeding it up. We present Coz, a causal profiler, which we evaluate on a range of highly-tuned applications: Memcached, SQLite, and the PARSEC benchmark suite. Coz identifies previously unknown optimization opportunities that are both significant and targeted. Guided by Coz, we improve the performance of Memcached by 9%, SQLite by 25%, and accelerate six PARSEC applications by as much as 68%; in most cases, these optimizations involve modifying under 10 lines of code.Comment: Published at SOSP 2015 (Best Paper Award

arXiv.org e-Print Archive

Crossref

Optimizing JVM profiling performance for Honest Profiler

Author: Oja Tiit
Publication venue
Publication date: 01/01/2018
Field of study

Honest Profiler on tööriist, mis võimaldab mõõta Java virtuaalmasina peal jooksvate rakenduste jõudlust. Tööriista poolt kogutud informatsiooni põhjal on võimalik optimeerida vaadeldava rakenduse jõudlust. Käesoleva töö eesmärk on luua lahendusi, mis suurendaksid Honest Profileri tööriista poolt kogutud informatsiooni hulka. Suurem andmete hulk muudab jõudluse mõõtmise tulemused täpsemaks. Töö kirjeldab profiilide kogumise ning Honest Profileri arhitektuuri põhitõdesid. Ühtlasi mõõdetakse Honest Profileri informatsiooni kogumise loogika jõudlust. Töö põhitulem on kolm erinevat lähenemist, mis suurendavad kogutud informatsiooni hulka. Kirjeldatud lahenduste jõudlus ning kogutud informatsiooni hulk verifitseeritakse jõudlustesti abil.Honest Profiler is a profiling tool which extracts performance information from applications running on the Java Virtual Machine. This information helps to locate the performance bottlenecks in the application observed. This thesis aims to provide solutions to increase the amount of useful information extracted by Honest Profiler. Achieving this would increase the accuracy of the performance information collected by Honest Profiler. Thesis will cover the basics of sampling profiling, the architecture of Honest Profiler and measures the performance of Honest Profiler’s data collection logic. As the main result of this thesis, three different solutions for increasing the profiler information output are presented. Their performance and the extracted information amount is evaluated by a benchmark test

DSpace at Tartu University Library

Observable dynamic compilation

Author: Binder Walter
Zheng Yudi
Publication venue
Publication date: 22/05/2017
Field of study

Managed language platforms such as the Java Virtual Machine rely on a dynamic compiler to achieve high performance. Despite the benefits that dynamic compilation provides, it also introduces some challenges to program profiling. Firstly, profilers based on bytecode instrumentation may yield wrong results in the presence of an optimizing dynamic compiler, either due to not being aware of optimizations, or because the inserted instrumentation code disrupts such optimizations. To avoid such perturbations, we present a technique to make profilers based on bytecode instrumentation aware of the optimizations performed by the dynamic compiler, and make the dynamic compiler aware of the inserted code. We implement our technique for separating inserted instrumentation code from base-program code in Oracle's Graal compiler, integrating our extension into the OpenJDK Graal project. We demonstrate its significance with concrete profilers. On the one hand, we improve accuracy of existing profiling techniques, for example, to quantify the impact of escape analysis on bytecode-level allocation profiling, to analyze object life-times, and to evaluate the impact of method inlining when profiling method invocations. On the other hand, we also illustrate how our technique enables new kinds of profilers, such as a profiler for non-inlined callsites, and a testing framework for locating performance bugs in dynamic compiler implementations. Secondly, the lack of profiling support at the intermediate representation (IR) level complicates the understanding of program behavior in the compiled code. This issue cannot be addressed by bytecode instrumentation because it cannot precisely capture the occurrence of IR-level operations. Binary instrumentation is not suited either, as it lacks a mapping from the collected low-level metrics to higher-level operations of the observed program. To fill this gap, we present an easy-to-use event-based framework for profiling operations at the IR level. We integrate the IR profiling framework in the Graal compiler, together with our instrumentation-separation technique. We illustrate our approach with a profiler that tracks the execution of memory barriers within compiled code. In addition, using a deoptimization profiler based on our IR profiling framework, we conduct an empirical study on deoptimization in the Graal compiler. We focus on situations which cause program execution to switch from machine code to the interpreter, and compare application performance using three different deoptimization strategies which influence the amount of extra compilation work done by Graal. Using an adaptive deoptimization strategy, we manage to improve the average start-up performance of benchmarks from the DaCapo, ScalaBench, and Octane suites by avoiding wasted compilation work. We also find that different deoptimization strategies have little impact on steady- state performance

RERO DOC Digital Library

How accurately do Java profilers predict runtime performance bottlenecks?

Author: Klijn P.
Publication venue: Universiteit van Amsterdam
Publication date: 01/01/2014
Field of study

CWI's Institutional Repository

Profiling tools for Java

Author: Gomes Vítor Domingos Araújo
Publication venue
Publication date: 06/05/2022
Field of study

Dissertação de mestrado integrado em Informatics EngineeringAtualmente, Java é uma das linguagens de programação mais populares. Esta popularidade é parcialmente devida à sua portabilidade que advém do facto do código Java ser compilado para bytecode que poderá ser executado por uma máquina virtual Java (JVM) compatível em qualquer sistema. A JVM pode depois interpretar diretamente ou compilar para código máquina a aplicação Java. No entanto, esta execução sobre uma máquina virtual cria alguns obstáculos à obtenção do perfil de execução de aplicações. Perfis de execução são valiosos para quem procura compreender o comportamento de uma aplicação pela recolha de métricas sobre a sua execução. A obtenção de perfis corretos é importante, mas a sua obtenção e análise pode ser desafiante, particularmente para aplicações paralelas. Esta dissertação sugere um fluxo de trabalho de otimização a aplicar na procura de aumentos na escalabilidade de aplicações Java paralelas. Este fluxo sugerido foi concebido para facilitar a descoberta dos problemas de desempenho que afetam uma dada aplicação paralela e sugerir ações a tomar para os investigar a fundo. O fluxo de trabalho utiliza a noção de possible speedups para quantificar o impacto de problemas de desempenho diferentes. A ideia de possible speedups passa por estimar o speedup que uma aplicação poderia atingir se um problema de desempenho específico fosse completamente removido. Esta estimativa é calculada utilizando as métricas recolhidas durante uma análise ao perfil de uma aplicação paralela e de uma versão sequencial da mesma aplicação. O conjunto de problemas de desempenho considerados incluem o desequilíbrio da carga de trabalho, sobre carga de paralelismo devido ao aumento no número de instruções executadas, sobrecarga de sincronização, gargalos de desempenho no acesso à memória e a fração de trabalho sequencial. Estes problemas foram considerados as causas mais comuns de limitações à escalabilidade de aplicações paralelas. Para investigar mais a fundo o efeito destes problemas numa aplicação paralela, são sugeridos alguns modos de visualização do perfil de execução de uma aplicação dependendo do problema que mais limita a sua escalabilidade. As visualizações sugeridas consistem maioritariamente de diferentes tipos de flame graphs do perfil de uma aplicação. Duas ferramentas foram desenvolvidas para ajudar a aplicar este fluxo de trabalho na otimização de aplicações Java paralelas. Uma destas ferramentas utiliza o async-profiler para recolher perfis de execução de uma dada aplicação Java. A outra ferramenta utiliza os perfis recolhidos pela primeira ferramenta para estimar possible speedups e produzir todas as visualizações mencionadas no fluxo de trabalho sugerido. Por fim, o fluxo de trabalho foi validado com alguns casos de estudo. O caso de estudo principal consistiu na otimização iterativa de um algoritmo K-means, partindo de uma implementação sequencial e resultando no aumento gradual da escalabilidade da aplicação. Casos de estudo adicionais também foram apresentados para ilustrar possibilidades não abordadas no caso de estudo principal.Java is currently one of the most popular programming languages. This popularity is, in part, due to the portability it offers which comes from the fact that Java source code is compiled into bytecode which can be executed by a compatible Java Virtual Machine (JVM) in a different system. The JVM can then directly interpret or compile into machine code the Java application. However, this execution on top of a virtual machine creates some obstacles to developers looking to profile their applications. Profilers are precious tools for developers who seek to understand an application’s behaviour by collecting metrics about its execution. Obtaining accurate profiles of an application is important, but they can also be challenging to obtain and to analyse, particularly for parallel applications. This dissertation suggests an optimisation workflow to employ in the pursuit of reducing scalability bottlenecks of parallel Java applications. The workflow is designed to simplify the discovery of the performance problems affecting a given parallel application and suggest possible actions to investigate them further. The suggested workflow relies on possible speedups to quantify the impact of different performance problems. The idea of possible speedups is to estimate the speedup an application could achieve if a specific performance problem were to completely disappear. This estimation is performed using metrics collected during the profile of the parallel application and its sequential version. The set of performance problems considered include workload imbalance, parallelism overhead due to an increase in the number of instructions, synchronisation overhead, memory bottlenecks and the fraction of se quential workloads. These were deemed to be the most common causes for scalability issues in parallel appli cations. To further investigate the effect of these problems on a parallel application, some visualisations of the application’s behaviour are suggested depending on which problem limits scalability the most. The suggested visualisations mostly consist of different flame graphs of the application’s profile. Two tools were also developed to help in the application of this optimisation workflow for parallel Java appli cations. One of these tools relies on async-profiler to collect profiles of a given Java application. The other tool uses the profiles collected by the first tool to estimate possible speedups and also produce all visualisations mentioned in the suggested workflow. Finally, the workflow was validated on multiple case studies. The main case study was the iterative optimisation of a K-means algorithm, starting from a sequential implementation and resulting in the gradual increase of the application’s scalability. Additional case studies were also presented in order to highlight additional paths not covered in the main case study

Universidade do Minho: RepositoriUM

Sulong-OpenMP: Implementation with Sulong and Evaluation

Author: Gaikwad Swapnil
Publication venue
Publication date: 31/12/2020
Field of study

The University of Manchester - Institutional Repository

DYNJA: a dynamic resource analyzer for multi-theaded Java

Author: Troyano Rollán Iván
Troyano Rollán Óscar
Publication venue
Publication date: 01/01/2013
Field of study

Presentamos a continuación el concepto, el uso y la implementación prototípica de Dynja, un analizador dinámico de consumo de recursos para programas Java multi-hilo. El sistema recibe como entrada una aplicación Java, los valores iniciales de sus parámetros de entrada, y con ello se calculan y se miden las siguientes tres métricas disponibles actualmente: número de instrucciones ejecutadas de bytecode (código de bytes), número (y tipo) de los objetos creados, y el número (y nombre) de los métodos invocados. Dynja proporciona como salida los recursos consumidos por cada hilo de acuerdo con la métrica(s) seleccionada(s). Nuestro analizador dinámico de recursos se ha implementado haciendo uso del framework Java Virtual Machine Tool Interface (JVMTI), un interfaz de programación nativo que permite inspeccionar el estado y controlar la ejecución de las aplicaciones que se ejecutan en una JVM. Las principales conclusiones del presente trabajo se han enviado para su evaluación al congreso “Principies and Practice of Programming in Java (PPPJ’13)” y actualmente se encuentra en proceso de revisión. El artículo se puede encontrar en el apéndice. [ABSTRACT] We present the concepts, usage and prototypical implementation of Dynja, a dynamic resource analyzer for multi-threaded Java. The system receives as input a Java application, initial values for its input parameters, and the cost metrics to be measured among the three metrics currently available: number of executed bytecode instructions, number (and type) of objects created, and number (and name) of methods invoked. Dynja yields as output the resources consumed by each thread according to the selected metric(s). Our dynamic resource analyzer has been implemented using the Java Virtual Machine Tool Interface (JVMTI), a native programming interface which allows inspecting the state and controlling the execution of applications running in a JVM. The main conclusions of this work have been submitted for assessment to Congress "Principles and Practice of Programming in Java (PPPJ'13)" and is currently under review. The article can be found in the appendix

Docta Complutense

Energy-Aware Development and Labeling for Mobile Applications

Author: Wilke Claas
Publication venue
Publication date: 14/03/2014
Field of study

Today, mobile devices such as smart phones and tablets have become ubiquitous and are used everywhere. Millions of software applications can be purchased and installed on these devices, customizing them to personal interests and needs. However, the frequent use of mobile devices has let a new problem become omnipresent: their limited operation time, due to their limited energy capacities. Although energy consumption can be considered as being a hardware problem, the amount of energy required by today’s mobile devices highly depends on their current workloads, being highly influenced by the software running on them. Thus, although only hardware modules are consuming energy, operating systems, middleware services, and mobile applications highly influence the energy consumption of mobile devices, depending on how efficient they use and control hardware modules. Nevertheless, most of today’s mobile applications totally ignore their influence on the devices’ energy consumption, leading to energy wastes, shorter operation times, and thus, frustrated application users. A major reason for this energy-unawareness is the lack for appropriate tooling for the development of energy-aware mobile applications. As many mobile applications are today behaving energy-unaware and various mobile applications providing similar services exist, mobile application users aim to optimize their devices by installing applications being known as energy-saving or energy-aware; meaning that they consume less energy while providing the same services as their competitors. However, scarce information on the applications’ energy usage is available and, thus, users are forced to install and try many applications manually, before finding the applications fulfilling their personal functional, non-functional, and energy requirements. This thesis addresses the lack of tooling for the development of energy-aware mobile applications and the lack of comparability of mobile applications in terms of energy-awareness with the following two contributions: First, it proposes JouleUnit, an energy profiling and testing framework using unit-tests for the execution of application workloads while profiling their energy consumption in parallel. By extending a well-known testing concept and providing tooling integrated into the development environment Eclipse, JouleUnit requires a low learning curve for the integration into existing development and testing processes. Second, for the comparability of mobile applications in terms of energy efficiency, this thesis proposes an energy benchmarking and labeling service. Mobile applications belonging to the same usage domain are energy-profiled while executing a usage-domain specific benchmark in parallel. Thus, their energy consumption for specific use cases can be evaluated and compared afterwards. To abstract and summarize the profiling results, energy labels are derived that summarize the applications’ energy consumption over all evaluated use cases as a simple energy grade, ranging from A to G. Besides, users can decide how to weigh specific use cases for the computation of energy grades, as it is likely that different users use the same applications differently. The energy labeling service has been implemented for Android applications and evaluated for three different usage domains (being web browsers, email clients, and live wallpapers), showing that different mobile applications indeed differ in their energy consumption for the same services and, thus, their comparison is both possible and sensible. To the best of my knowledge, this is the first approach providing mobile application users comparable energy consumption information on mobile applications without installing and testing them on their own mobile devices

Technische Universität Dresden: Qucosa