486 research outputs found
Towards co-designed optimizations in parallel frameworks: A MapReduce case study
The explosion of Big Data was followed by the proliferation of numerous
complex parallel software stacks whose aim is to tackle the challenges of data
deluge. A drawback of a such multi-layered hierarchical deployment is the
inability to maintain and delegate vital semantic information between layers in
the stack. Software abstractions increase the semantic distance between an
application and its generated code. However, parallel software frameworks
contain inherent semantic information that general purpose compilers are not
designed to exploit.
This paper presents a case study demonstrating how the specific semantic
information of the MapReduce paradigm can be exploited on multicore
architectures. MR4J has been implemented in Java and evaluated against
hand-optimized C and C++ equivalents. The initial observed results led to the
design of a semantically aware optimizer that runs automatically without
requiring modification to application code.
The optimizer is able to speedup the execution time of MR4J by up to 2.0x.
The introduced optimization not only improves the performance of the generated
code, during the map phase, but also reduces the pressure on the garbage
collector. This demonstrates how semantic information can be harnessed without
sacrificing sound software engineering practices when using parallel software
frameworks.Comment: 8 page
On the generation and analysis of program transformations
This thesis discusses the idea of using domain specific languages for program transformation, and the application, implementation and analysis of one such domain specific
language that combines rewrite rules for transformation and uses temporal logic to express its side conditions. We have conducted three investigations.
- An efficient implementation is described that is able to generate compiler optimizations from temporal logic specifications. Its description is accompanied by an
empirical study of its performance.
- We extend the fundamental ideas of this language to source code in order to write
bug fixing transformations. Example transformations are given that fix common
bugs within Java programs. The adaptations to the transformation language are
described and a sample implementation which can apply these transformations is
provided.
- We describe an approach to the formal analysis of compiler optimizations that
proves that the optimizations do not change the semantics of the program that
they are optimizing. Some example proofs are included.
The result of these combined investigations is greater than the sum of their parts.
By demonstrating that a declarative language may be efficiently applied and formally reasoned about satisfies both theoretical and practical concerns, whilst our extension
towards bug fixing shows more varied uses are possible
Static analysis of energy consumption for LLVM IR programs
Energy models can be constructed by characterizing the energy consumed by
executing each instruction in a processor's instruction set. This can be used
to determine how much energy is required to execute a sequence of assembly
instructions, without the need to instrument or measure hardware.
However, statically analyzing low-level program structures is hard, and the
gap between the high-level program structure and the low-level energy models
needs to be bridged. We have developed techniques for performing a static
analysis on the intermediate compiler representations of a program.
Specifically, we target LLVM IR, a representation used by modern compilers,
including Clang. Using these techniques we can automatically infer an estimate
of the energy consumed when running a function under different platforms, using
different compilers.
One of the challenges in doing so is that of determining an energy cost of
executing LLVM IR program segments, for which we have developed two different
approaches. When this information is used in conjunction with our analysis, we
are able to infer energy formulae that characterize the energy consumption for
a particular program. This approach can be applied to any languages targeting
the LLVM toolchain, including C and XC or architectures such as ARM Cortex-M or
XMOS xCORE, with a focus towards embedded platforms. Our techniques are
validated on these platforms by comparing the static analysis results to the
physical measurements taken from the hardware. Static energy consumption
estimation enables energy-aware software development, without requiring
hardware knowledge
Observable dynamic compilation
Managed language platforms such as the Java Virtual Machine rely on a dynamic compiler to achieve high performance. Despite the benefits that dynamic compilation provides, it also introduces some challenges to program profiling. Firstly, profilers based on bytecode instrumentation may yield wrong results in the presence of an optimizing dynamic compiler, either due to not being aware of optimizations, or because the inserted instrumentation code disrupts such optimizations. To avoid such perturbations, we present a technique to make profilers based on bytecode instrumentation aware of the optimizations performed by the dynamic compiler, and make the dynamic compiler aware of the inserted code. We implement our technique for separating inserted instrumentation code from base-program code in Oracle's Graal compiler, integrating our extension into the OpenJDK Graal project. We demonstrate its significance with concrete profilers. On the one hand, we improve accuracy of existing profiling techniques, for example, to quantify the impact of escape analysis on bytecode-level allocation profiling, to analyze object life-times, and to evaluate the impact of method inlining when profiling method invocations. On the other hand, we also illustrate how our technique enables new kinds of profilers, such as a profiler for non-inlined callsites, and a testing framework for locating performance bugs in dynamic compiler implementations. Secondly, the lack of profiling support at the intermediate representation (IR) level complicates the understanding of program behavior in the compiled code. This issue cannot be addressed by bytecode instrumentation because it cannot precisely capture the occurrence of IR-level operations. Binary instrumentation is not suited either, as it lacks a mapping from the collected low-level metrics to higher-level operations of the observed program. To fill this gap, we present an easy-to-use event-based framework for profiling operations at the IR level. We integrate the IR profiling framework in the Graal compiler, together with our instrumentation-separation technique. We illustrate our approach with a profiler that tracks the execution of memory barriers within compiled code. In addition, using a deoptimization profiler based on our IR profiling framework, we conduct an empirical study on deoptimization in the Graal compiler. We focus on situations which cause program execution to switch from machine code to the interpreter, and compare application performance using three different deoptimization strategies which influence the amount of extra compilation work done by Graal. Using an adaptive deoptimization strategy, we manage to improve the average start-up performance of benchmarks from the DaCapo, ScalaBench, and Octane suites by avoiding wasted compilation work. We also find that different deoptimization strategies have little impact on steady- state performance
- …