265 research outputs found

    Simple and Effective Type Check Removal through Lazy Basic Block Versioning

    Get PDF
    Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language VM implementations must attempt to eliminate redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. This paper introduces lazy basic block versioning, a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses and avoids the implementation complexity of speculative optimization techniques. We have implemented intraprocedural lazy basic block versioning in a JavaScript JIT compiler. This approach is compared with a classical flow-based type analysis. Lazy basic block versioning performs as well or better on all benchmarks. On average, 71% of type tests are eliminated, yielding speedups of up to 50%. We also show that our implementation generates more efficient machine code than TraceMonkey, a tracing JIT compiler for JavaScript, on several benchmarks. The combination of implementation simplicity, low algorithmic complexity and good run time performance makes basic block versioning attractive for baseline JIT compilers

    Automatic Performance Optimization of Stencil Codes

    Get PDF
    A widely used class of codes are stencil codes. Their general structure is very simple: data points in a large grid are repeatedly recomputed from neighboring values. This predefined neighborhood is the so-called stencil. Despite their very simple structure, stencil codes are hard to optimize since only few computations are performed while a comparatively large number of values have to be accessed, i.e., stencil codes usually have a very low computational intensity. Moreover, the set of optimizations and their parameters also depend on the hardware on which the code is executed. To cut a long story short, current production compilers are not able to fully optimize this class of codes and optimizing each application by hand is not practical. As a remedy, we propose a set of optimizations and describe how they can be applied automatically by a code generator for the domain of stencil codes. A combination of a space and time tiling is able to increase the data locality, which significantly reduces the memory-bandwidth requirements: a standard three-dimensional 7-point Jacobi stencil can be accelerated by a factor of 3. This optimization can target basically any stencil code, while others are more specialized. E.g., support for arbitrary linear data layout transformations is especially beneficial for colored kernels, such as a Red-Black Gauss-Seidel smoother. On the one hand, an optimized data layout for such kernels reduces the bandwidth requirements while, on the other hand, it simplifies an explicit vectorization. Other noticeable optimizations described in detail are redundancy elimination techniques to eliminate common subexpressions both in a sequence of statements and across loop boundaries, arithmetic simplifications and normalizations, and the vectorization mentioned previously. In combination, these optimizations are able to increase the performance not only of the model problem given by Poisson’s equation, but also of real-world applications: an optical flow simulation and the simulation of a non-isothermal and non-Newtonian fluid flow

    An Efficient and Flexible Implementation of Aspect-Oriented Languages

    Get PDF
    Compilers for modern object-oriented programming languages generate code in a platform independent intermediate language preserving the concepts of the source language; for example, classes, fields, methods, and virtual or static dispatch can be directly identified within the intermediate code. To execute this intermediate code, state-of-the-art implementations of virtual machines perform just-in-time (JIT) compilation of the intermediate language; i.e., the virtual instructions in the intermediate code are compiled to native machine code at runtime. In this step, a declarative representation of source language concepts in the intermediate language facilitates highly efficient adaptive and speculative optimization of the running program which may not be possible otherwise. In contrast, constructs of aspect-oriented languages - which improve the separation of concerns - are commonly realized by compiling them to conventional intermediate language instructions or by driving transformations of the intermediate code, which is called weaving. This way the aspect-oriented constructs' semantics is not preserved in a declarative manner at the intermediate language level. This representational gap between aspect-oriented concepts in the source code and in the intermediate code hinders high performance optimizations and weakens features of software engineering processes like debugging support or the continuity property of incremental compilation: modifying an aspect in the source code potentially requires re-weaving multiple other modules. To leverage language implementation techniques for aspect-oriented languages, this thesis proposes the Aspect-Language Implementation Architecture (ALIA) which prescribes - amongst others - the existence of an intermediate representation preserving the aspect-oriented constructs of the source program. A central component of this architecture is an extensible and flexible meta-model of aspect-oriented concepts which acts as an interface between front-ends (usually a compiler) and back-ends (usually a virtual machine) of aspect-oriented language implementations. The architecture and the meta-model are embodied for Java-based aspect-oriented languages in the Framework for Implementing Aspect Languages (FIAL) respectively the Language-Independent Aspect Meta-Model (LIAM) which is part of the framework. FIAL generically implements the work flows required from an execution environment when executing aspects provided in terms of LIAM. In addition to the first-class intermediate representation of aspect-oriented concepts, ALIA - and the FIAL framework as its incarnation - treat the points of interaction between aspects and other modules - so-called join points - as being late-bound to an implementation. In analogy to the object-oriented terminology for late-bound methods, the join points are called virtual in ALIA. Together, the first-class representation of aspect-oriented concepts in the intermediate representation as well as treating join points as being virtual facilitate the implementation of new and effective optimizations for aspect-oriented programs. Three different instantiations of the FIAL framework are presented in this thesis, showcasing the feasibility of integrating language back-ends with different characteristics with the framework. One integration supports static aspect deployment and produces results similar to conventional aspect weavers; the woven code is executable on any standard Java virtual machine. Two instantiations are fully dynamic, where one is realized as a portable plug-in for standard Java virtual machines and the other one, called Steamloom^ALIA , is realized as a deep integration into a specific virtual machine, the Jikes Research Virtual Machine Alpern2005. While the latter instantiation is not portable, it exhibits an outstanding performance. Virtual join point dispatch is a generalization of virtual method dispatch. Thus, well established and elaborate optimization techniques from the field of virtual method dispatch are re-used with slight adaptations in Steamloom^ALIA . These optimizations for aspect-oriented concepts go beyond the generation of optimal bytecode. Especially strikingly, the power of such optimizations is shown in this thesis by the examples of the cflow dynamic property, which may be necessary to evaluate during virtual join point dispatch, and dynamic aspect deployment - i.e., the selective modification of specific join points' dispatch. In order to evaluate the optimization techniques developed in this thesis, a means for benchmarking has been developed in terms of macro-benchmarks; i.e., real-world applications are executed. These benchmarks show that for both concepts the implementation presented here is at least circa twice as fast as state-of-the-art implementations performing static optimizations of the generated bytecode; in many cases this thesis's optimizations even reach a speed-up of two orders of magnitude for the cflow implementation and even four orders of magnitude for the dynamic deployment. The intermediate representation in terms of LIAM models is general enough to express the constructs of multiple aspect-oriented languages. Therefore, optimizations of features common to different languages are available to applications written in all of them. To proof that the abstractions provided by LIAM are sufficient to act as intermediate language for multiple aspect-oriented source languages, an automated translation from source code to LIAM models has been realized for three very different and popular aspect-oriented languages: AspectJ, JAsCo and Compose*. In addition, the feasibility of translating from CaesarJ to LIAM models is shown by discussion. The use of an extensible meta-model as intermediate representation furthermore simplifies the definition of new aspect-oriented language concepts as is shown in terms of a tutorial-style example of designing a domain specific extension to the Java language in this thesis

    Decoupling algorithms from schedules for easy optimization of image processing pipelines

    Get PDF
    Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism. We propose a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code. We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than state-of-the-art implementations.National Science Foundation (U.S.) (Grant 0964004)National Science Foundation (U.S.) (Grant 0964218)National Science Foundation (U.S.) (Grant 0832997)United States. Dept. of Energy (Award DE-SC0005288)Cognex CorporationAdobe System

    Eliminating intermediate lists in pH using local transformations

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (p. 68-69).by Jan-Willem Maessen.M.Eng

    Dynamic task fusion for a block-structured finite volume solver over a dynamically adaptive mesh with local time stepping

    Get PDF
    Load balancing of generic wave equation solvers over dynamically adaptive meshes with local time stepping is dicult, as the load changes with every time step. Task-based programming promises to mitigate the load balancing problem. We study a Finite Volume code over dynamically adaptive block-structured meshes for two astrophysics simulations, where the patches (blocks) dene tasks. They are classied into urgent and low priority tasks. Urgent tasks are algorithmically latencysensitive. They are processed directly as part of our bulk-synchronous mesh traversals. Non-urgent tasks are held back in an additional task queue on top of the task runtime system. If they lack global side-eects, i.e. do not alter the global solver state, we can generate optimised compute kernels for these tasks. Furthermore, we propose to use the additional queue to merge tasks without side-eects into task assemblies, and to balance out imbalanced bulk synchronous processing phases

    Formal Compiler Implementation in a Logical Framework

    Get PDF
    The task of designing and implementing a compiler can be a difficult and error-prone process. In this paper, we present a new approach based on the use of higher-order abstract syntax and term rewriting in a logical framework. All program transformations, from parsing to code generation, are cleanly isolated and specified as term rewrites. This has several advantages. The correctness of the compiler depends solely on a small set of rewrite rules that are written in the language of formal mathematics. In addition, the logical framework guarantees the preservation of scoping, and it automates many frequently-occurring tasks including substitution and rewriting strategies. As we show, compiler development in a logical framework can be easier than in a general-purpose language like ML, in part because of automation, and also because the framework provides extensive support for examination, validation, and debugging of the compiler transformations. The paper is organized around a case study, using the MetaPRL logical framework to compile an ML-like language to Intel x86 assembly. We also present a scoped formalization of x86 assembly in which all registers are immutable
    • …
    corecore