1,222 research outputs found
Compiling Signal Processing Code embedded in Haskell via LLVM
We discuss a programming language for real-time audio signal processing that
is embedded in the functional language Haskell and uses the Low-Level Virtual
Machine as back-end. With that framework we can code with the comfort and type
safety of Haskell while achieving maximum efficiency of fast inner loops and
full vectorisation. This way Haskell becomes a valuable alternative to special
purpose signal processing languages.Comment: 8 pages, 1 figure, 3 listings, 1 table, accepted by Linux Audio
Conference LAC201
Effective Extensible Programming: Unleashing Julia on GPUs
GPUs and other accelerators are popular devices for accelerating
compute-intensive, parallelizable applications. However, programming these
devices is a difficult task. Writing efficient device code is challenging, and
is typically done in a low-level programming language. High-level languages are
rarely supported, or do not integrate with the rest of the high-level language
ecosystem. To overcome this, we propose compiler infrastructure to efficiently
add support for new hardware or environments to an existing programming
language.
We evaluate our approach by adding support for NVIDIA GPUs to the Julia
programming language. By integrating with the existing compiler, we
significantly lower the cost to implement and maintain the new compiler, and
facilitate reuse of existing application code. Moreover, use of the high-level
Julia programming language enables new and dynamic approaches for GPU
programming. This greatly improves programmer productivity, while maintaining
application performance similar to that of the official NVIDIA CUDA toolkit
Souper: A Synthesizing Superoptimizer
If we can automatically derive compiler optimizations, we might be able to
sidestep some of the substantial engineering challenges involved in creating
and maintaining a high-quality compiler. We developed Souper, a synthesizing
superoptimizer, to see how far these ideas might be pushed in the context of
LLVM. Along the way, we discovered that Souper's intermediate representation
was sufficiently similar to the one in Microsoft Visual C++ that we applied
Souper to that compiler as well. Shipping, or about-to-ship, versions of both
compilers contain optimizations suggested by Souper but implemented by hand.
Alternately, when Souper is used as a fully automated optimization pass it
compiles a Clang compiler binary that is about 3 MB (4.4%) smaller than the one
compiled by LLVM
High-level GPU programming in Julia
GPUs are popular devices for accelerating scientific calculations. However,
as GPU code is usually written in low-level languages, it breaks the
abstractions of high-level languages popular with scientific programmers. To
overcome this, we present a framework for CUDA GPU programming in the
high-level Julia programming language. This framework compiles Julia source
code for GPU execution, and takes care of the necessary low-level interactions
using modern code generation techniques to avoid run-time overhead.
Evaluating the framework and its APIs on a case study comprising the trace
transform from the field of image processing, we find that the impact on
performance is minimal, while greatly increasing programmer productivity. The
metaprogramming capabilities of the Julia language proved invaluable for
enabling this. Our framework significantly improves usability of GPUs, making
them accessible for a wide range of programmers. It is available as free and
open-source software licensed under the MIT License
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
ClangJIT: Enhancing C++ with Just-in-Time Compilation
The C++ programming language is not only a keystone of the
high-performance-computing ecosystem but has proven to be a successful base for
portable parallel-programming frameworks. As is well known, C++ programmers use
templates to specialize algorithms, thus allowing the compiler to generate
highly-efficient code for specific parameters, data structures, and so on. This
capability has been limited to those specializations that can be identified
when the application is compiled, and in many critical cases, compiling all
potentially-relevant specializations is not practical. ClangJIT provides a
well-integrated C++ language extension allowing template-based specialization
to occur during program execution. This capability has been implemented for use
in large-scale applications, and we demonstrate that
just-in-time-compilation-based dynamic specialization can be integrated into
applications, often requiring minimal changes (or no changes) to the
applications themselves, providing significant performance improvements,
programmer-productivity improvements, and decreased compilation time
Automatic differentiation in ML: Where we are and where we should be going
We review the current state of automatic differentiation (AD) for array
programming in machine learning (ML), including the different approaches such
as operator overloading (OO) and source transformation (ST) used for AD,
graph-based intermediate representations for programs, and source languages.
Based on these insights, we introduce a new graph-based intermediate
representation (IR) which specifically aims to efficiently support
fully-general AD for array programming. Unlike existing dataflow programming
representations in ML frameworks, our IR naturally supports function calls,
higher-order functions and recursion, making ML models easier to implement. The
ability to represent closures allows us to perform AD using ST without a tape,
making the resulting derivative (adjoint) program amenable to ahead-of-time
optimization using tools from functional language compilers, and enabling
higher-order derivatives. Lastly, we introduce a proof of concept compiler
toolchain called Myia which uses a subset of Python as a front end
Reasoning About LLVM Code Using Codewalker
This paper reports on initial experiments using J Moore's Codewalker to
reason about programs compiled to the Low-Level Virtual Machine (LLVM)
intermediate form. Previously, we reported on a translator from LLVM to the
applicative subset of Common Lisp accepted by the ACL2 theorem prover,
producing executable ACL2 formal models, and allowing us to both prove theorems
about the translated models as well as validate those models by testing. That
translator provided many of the benefits of a pure decompilation into logic
approach, but had the disadvantage of not being verified. The availability of
Codewalker as of ACL2 7.0 has provided an opportunity to revisit this idea, and
employ a more trustworthy decompilation into logic tool. Thus, we have employed
the Codewalker method to create an interpreter for a subset of the LLVM
instruction set, and have used Codewalker to analyze some simple array-based C
programs compiled to LLVM form. We discuss advantages and limitations of the
Codewalker-based method compared to the previous method, and provide some
challenge problems for future Codewalker development.Comment: In Proceedings ACL2 2015, arXiv:1509.0552
Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs
Google's Cloud TPUs are a promising new hardware architecture for machine
learning workloads. They have powered many of Google's milestone machine
learning achievements in recent years. Google has now made TPUs available for
general use on their cloud platform and as of very recently has opened them up
further to allow use by non-TensorFlow frontends. We describe a method and
implementation for offloading suitable sections of Julia programs to TPUs via
this new API and the Google XLA compiler. Our method is able to completely fuse
the forward pass of a VGG19 model expressed as a Julia program into a single
TPU executable to be offloaded to the device. Our method composes well with
existing compiler-based automatic differentiation techniques on Julia code, and
we are thus able to also automatically obtain the VGG19 backwards pass and
similarly offload it to the TPU. Targeting TPUs using our compiler, we are able
to evaluate the VGG19 forward pass on a batch of 100 images in 0.23s which
compares favorably to the 52.4s required for the original model on the CPU. Our
implementation is less than 1000 lines of Julia, with no TPU specific changes
made to the core Julia compiler or any other Julia packages.Comment: Submitted to SysML 201
ParaSail: A Pointer-Free Pervasively-Parallel Language for Irregular Computations
ParaSail is a language specifically designed to simplify the construction of
programs that make full, safe use of parallel hardware even while manipulating
potentially irregular data structures. As parallel hardware has proliferated,
there has been an urgent need for languages that ease the writing of correct
parallel programs. ParaSail achieves these goals largely through simplification
of the language, rather than by adding numerous rules. In particular, ParaSail
eliminates global variables, parameter aliasing, and most significantly,
re-assignable pointers. ParaSail has adopted a pointer-free approach to
defining complex data structures. Rather than using pointers, ParaSail supports
flexible data structuring using expandable (and shrinkable) objects implemented
using region-based storage management, along with generalized indexing. By
eliminating global variables, parameter aliasing, and pointers, ParaSail
reduces the complexity for the programmer, while still allowing ParaSail to
provide flexible, pervasive, safe, parallel programming for irregular
computations. Perhaps the most interesting discovery in this language
development effort, based on over six years of use by the author and a group of
ParaSail users, has been that it is possible to simultaneously simplify the
language, support parallel programming with advanced data structures, and
maintain flexibility and efficiency
- …