1,222 research outputs found

    Compiling Signal Processing Code embedded in Haskell via LLVM

    Full text link
    We discuss a programming language for real-time audio signal processing that is embedded in the functional language Haskell and uses the Low-Level Virtual Machine as back-end. With that framework we can code with the comfort and type safety of Haskell while achieving maximum efficiency of fast inner loops and full vectorisation. This way Haskell becomes a valuable alternative to special purpose signal processing languages.Comment: 8 pages, 1 figure, 3 listings, 1 table, accepted by Linux Audio Conference LAC201

    Effective Extensible Programming: Unleashing Julia on GPUs

    Full text link
    GPUs and other accelerators are popular devices for accelerating compute-intensive, parallelizable applications. However, programming these devices is a difficult task. Writing efficient device code is challenging, and is typically done in a low-level programming language. High-level languages are rarely supported, or do not integrate with the rest of the high-level language ecosystem. To overcome this, we propose compiler infrastructure to efficiently add support for new hardware or environments to an existing programming language. We evaluate our approach by adding support for NVIDIA GPUs to the Julia programming language. By integrating with the existing compiler, we significantly lower the cost to implement and maintain the new compiler, and facilitate reuse of existing application code. Moreover, use of the high-level Julia programming language enables new and dynamic approaches for GPU programming. This greatly improves programmer productivity, while maintaining application performance similar to that of the official NVIDIA CUDA toolkit

    Souper: A Synthesizing Superoptimizer

    Full text link
    If we can automatically derive compiler optimizations, we might be able to sidestep some of the substantial engineering challenges involved in creating and maintaining a high-quality compiler. We developed Souper, a synthesizing superoptimizer, to see how far these ideas might be pushed in the context of LLVM. Along the way, we discovered that Souper's intermediate representation was sufficiently similar to the one in Microsoft Visual C++ that we applied Souper to that compiler as well. Shipping, or about-to-ship, versions of both compilers contain optimizations suggested by Souper but implemented by hand. Alternately, when Souper is used as a fully automated optimization pass it compiles a Clang compiler binary that is about 3 MB (4.4%) smaller than the one compiled by LLVM

    High-level GPU programming in Julia

    Full text link
    GPUs are popular devices for accelerating scientific calculations. However, as GPU code is usually written in low-level languages, it breaks the abstractions of high-level languages popular with scientific programmers. To overcome this, we present a framework for CUDA GPU programming in the high-level Julia programming language. This framework compiles Julia source code for GPU execution, and takes care of the necessary low-level interactions using modern code generation techniques to avoid run-time overhead. Evaluating the framework and its APIs on a case study comprising the trace transform from the field of image processing, we find that the impact on performance is minimal, while greatly increasing programmer productivity. The metaprogramming capabilities of the Julia language proved invaluable for enabling this. Our framework significantly improves usability of GPUs, making them accessible for a wide range of programmers. It is available as free and open-source software licensed under the MIT License

    Building Efficient Query Engines in a High-Level Language

    Get PDF
    Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, LegoBase significantly outperforms a commercial database and an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines

    ClangJIT: Enhancing C++ with Just-in-Time Compilation

    Full text link
    The C++ programming language is not only a keystone of the high-performance-computing ecosystem but has proven to be a successful base for portable parallel-programming frameworks. As is well known, C++ programmers use templates to specialize algorithms, thus allowing the compiler to generate highly-efficient code for specific parameters, data structures, and so on. This capability has been limited to those specializations that can be identified when the application is compiled, and in many critical cases, compiling all potentially-relevant specializations is not practical. ClangJIT provides a well-integrated C++ language extension allowing template-based specialization to occur during program execution. This capability has been implemented for use in large-scale applications, and we demonstrate that just-in-time-compilation-based dynamic specialization can be integrated into applications, often requiring minimal changes (or no changes) to the applications themselves, providing significant performance improvements, programmer-productivity improvements, and decreased compilation time

    Automatic differentiation in ML: Where we are and where we should be going

    Full text link
    We review the current state of automatic differentiation (AD) for array programming in machine learning (ML), including the different approaches such as operator overloading (OO) and source transformation (ST) used for AD, graph-based intermediate representations for programs, and source languages. Based on these insights, we introduce a new graph-based intermediate representation (IR) which specifically aims to efficiently support fully-general AD for array programming. Unlike existing dataflow programming representations in ML frameworks, our IR naturally supports function calls, higher-order functions and recursion, making ML models easier to implement. The ability to represent closures allows us to perform AD using ST without a tape, making the resulting derivative (adjoint) program amenable to ahead-of-time optimization using tools from functional language compilers, and enabling higher-order derivatives. Lastly, we introduce a proof of concept compiler toolchain called Myia which uses a subset of Python as a front end

    Reasoning About LLVM Code Using Codewalker

    Full text link
    This paper reports on initial experiments using J Moore's Codewalker to reason about programs compiled to the Low-Level Virtual Machine (LLVM) intermediate form. Previously, we reported on a translator from LLVM to the applicative subset of Common Lisp accepted by the ACL2 theorem prover, producing executable ACL2 formal models, and allowing us to both prove theorems about the translated models as well as validate those models by testing. That translator provided many of the benefits of a pure decompilation into logic approach, but had the disadvantage of not being verified. The availability of Codewalker as of ACL2 7.0 has provided an opportunity to revisit this idea, and employ a more trustworthy decompilation into logic tool. Thus, we have employed the Codewalker method to create an interpreter for a subset of the LLVM instruction set, and have used Codewalker to analyze some simple array-based C programs compiled to LLVM form. We discuss advantages and limitations of the Codewalker-based method compared to the previous method, and provide some challenge problems for future Codewalker development.Comment: In Proceedings ACL2 2015, arXiv:1509.0552

    Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs

    Full text link
    Google's Cloud TPUs are a promising new hardware architecture for machine learning workloads. They have powered many of Google's milestone machine learning achievements in recent years. Google has now made TPUs available for general use on their cloud platform and as of very recently has opened them up further to allow use by non-TensorFlow frontends. We describe a method and implementation for offloading suitable sections of Julia programs to TPUs via this new API and the Google XLA compiler. Our method is able to completely fuse the forward pass of a VGG19 model expressed as a Julia program into a single TPU executable to be offloaded to the device. Our method composes well with existing compiler-based automatic differentiation techniques on Julia code, and we are thus able to also automatically obtain the VGG19 backwards pass and similarly offload it to the TPU. Targeting TPUs using our compiler, we are able to evaluate the VGG19 forward pass on a batch of 100 images in 0.23s which compares favorably to the 52.4s required for the original model on the CPU. Our implementation is less than 1000 lines of Julia, with no TPU specific changes made to the core Julia compiler or any other Julia packages.Comment: Submitted to SysML 201

    ParaSail: A Pointer-Free Pervasively-Parallel Language for Irregular Computations

    Full text link
    ParaSail is a language specifically designed to simplify the construction of programs that make full, safe use of parallel hardware even while manipulating potentially irregular data structures. As parallel hardware has proliferated, there has been an urgent need for languages that ease the writing of correct parallel programs. ParaSail achieves these goals largely through simplification of the language, rather than by adding numerous rules. In particular, ParaSail eliminates global variables, parameter aliasing, and most significantly, re-assignable pointers. ParaSail has adopted a pointer-free approach to defining complex data structures. Rather than using pointers, ParaSail supports flexible data structuring using expandable (and shrinkable) objects implemented using region-based storage management, along with generalized indexing. By eliminating global variables, parameter aliasing, and pointers, ParaSail reduces the complexity for the programmer, while still allowing ParaSail to provide flexible, pervasive, safe, parallel programming for irregular computations. Perhaps the most interesting discovery in this language development effort, based on over six years of use by the author and a group of ParaSail users, has been that it is possible to simultaneously simplify the language, support parallel programming with advanced data structures, and maintain flexibility and efficiency
    corecore