7,580 research outputs found

    Interprocedural Type Specialization of JavaScript Programs Without Type Analysis

    Get PDF
    Dynamically typed programming languages such as Python and JavaScript defer type checking to run time. VM implementations can improve performance by eliminating redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. Lazy basic block versioning is a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This approach does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses. This paper extends lazy basic block versioning to propagate type information interprocedurally, across function call boundaries. Our implementation in a JavaScript JIT compiler shows that across 26 benchmarks, interprocedural basic block versioning eliminates more type tag tests on average than what is achievable with static type analysis without resorting to code transformations. On average, 94.3% of type tag tests are eliminated, yielding speedups of up to 56%. We also show that our implementation is able to outperform Truffle/JS on several benchmarks, both in terms of execution time and compilation time.Comment: 10 pages, 10 figures, submitted to CGO 201

    Simple and Effective Type Check Removal through Lazy Basic Block Versioning

    Get PDF
    Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language VM implementations must attempt to eliminate redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. This paper introduces lazy basic block versioning, a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses and avoids the implementation complexity of speculative optimization techniques. We have implemented intraprocedural lazy basic block versioning in a JavaScript JIT compiler. This approach is compared with a classical flow-based type analysis. Lazy basic block versioning performs as well or better on all benchmarks. On average, 71% of type tests are eliminated, yielding speedups of up to 50%. We also show that our implementation generates more efficient machine code than TraceMonkey, a tracing JIT compiler for JavaScript, on several benchmarks. The combination of implementation simplicity, low algorithmic complexity and good run time performance makes basic block versioning attractive for baseline JIT compilers

    Master of Science

    Get PDF
    thesisScientific libraries are written in a general way in anticipation of a variety of use cases that reduce optimization opportunities. Significant performance gains can be achieved by specializing library code to its execution context: the application in which it is invoked, the input data set used, the architectural platform and its backend compiler. Such specialization is not typically done because it is time-consuming, leads to nonportable code and requires performance-tuning expertise that application scientists may not have. Tool support for library specialization in the above context could potentially reduce the extensive under-standing required while significantly improving performance, code reuse and portability. In this work, we study the performance gains achieved by specializing the sparse linear algebra functions in PETSc (Portable, Extensible Toolkit for Scientific Computation) in the context of three scientific applications on the Hopper Cray XE6 Supercomputer at NERSC. This work takes an initial step towards automating the specialization of scientific libraries. We study the effects of the execution environment on sparse computations and design optimization strategies based on these effects. These strategies include novel techniques that augment well-known source-to-source transformations to significantly improve the quality of the instructions generated by the back end compiler. We use CHiLL (Composable High-Level Loop Transformation Framework) to apply source-level transformations tailored to the special needs of sparse computations. A conceptual framework is proposed where the above strategies are developed and expressed as recipes by experienced performance engineers that can be applied across execution environments. We demonstrate significant performance improvements of more than 1.8X on the library functions and overall gains of 9 to 24% on three scalable applications that use PETSc's sparse matrix capabilities

    Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors

    Full text link
    This paper presents a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel. Architectural diversity among different processors together with structural diversity among different sparse matrices lead to bottleneck diversity. This justifies an SpMV optimizer that is both matrix- and architecture-adaptive through runtime specialization. To this direction, we present an approach that first identifies the performance bottlenecks of SpMV for a given sparse matrix on the target platform either through profiling or by matrix property inspection, and then selects suitable optimizations to tackle those bottlenecks. Our optimization pool is based on the widely used Compressed Sparse Row (CSR) sparse matrix storage format and has low preprocessing overheads, making our overall approach practical even in cases where fast decision making and optimization setup is required. We evaluate our optimizer on three x86-based computing platforms and demonstrate that it is able to distinguish and appropriately optimize SpMV for the majority of matrices in a representative test suite, leading to significant speedups over the CSR and Inspector-Executor CSR SpMV kernels available in the latest release of the Intel MKL library.Comment: 10 pages, 7 figures, ICPP 201

    Automating embedded analysis capabilities and managing software complexity in multiphysics simulation part II: application to partial differential equations

    Full text link
    A template-based generic programming approach was presented in a previous paper that separates the development effort of programming a physical model from that of computing additional quantities, such as derivatives, needed for embedded analysis algorithms. In this paper, we describe the implementation details for using the template-based generic programming approach for simulation and analysis of partial differential equations (PDEs). We detail several of the hurdles that we have encountered, and some of the software infrastructure developed to overcome them. We end with a demonstration where we present shape optimization and uncertainty quantification results for a 3D PDE application
    • …
    corecore