4 research outputs found
Expression Acceleration: Seamless Parallelization of Typed High-Level Languages
Efficient parallelization of algorithms on general-purpose GPUs is today
essential in many areas. However, it is a non-trivial task for software
engineers to utilize GPUs to improve the performance of high-level programs in
general. Although many domain-specific approaches are available for GPU
acceleration, it is difficult to accelerate existing high-level programs
without rewriting parts of the programs using low-level GPU code. In this
paper, we propose a different approach, where expressions are marked for
acceleration, and the compiler automatically infers which code needs to be
accelerated. We call this approach expression acceleration. We design a
compiler pipeline for the approach and show how to handle several challenges,
including expression extraction, well-formedness, and compiling using multiple
backends. The approach is designed and implemented within a statically-typed
functional intermediate language and evaluated using three distinct non-trivial
case studies
Efficient CHAD
We show how the basic Combinatory Homomorphic Automatic Differentiation
(CHAD) algorithm can be optimised, using well-known methods, to yield a simple
and generally applicable reverse-mode automatic differentiation (AD) technique
that has the correct computational complexity that we would expect of a reverse
AD algorithm. Specifically, we show that the standard optimisations of sparse
vectors and state-passing style code (as well as defunctionalisation/closure
conversion, for higher-order languages) give us a purely functional algorithm
that is most of the way to the correct complexity, with (functional) mutable
updates taking care of the final log-factors. We provide an Agda formalisation
of our complexity proof. Finally, we discuss how the techniques apply to
differentiating parallel functional programs: the key observations are 1) that
all required mutability is (commutative, associative) accumulation, which lets
us preserve task-parallelism and 2) that we can write down data-parallel
derivatives for most data-parallel array primitives
Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming
We present a novel programming language design that attempts to combine the
clarity and safety of high-level functional languages with the efficiency and
parallelism of low-level numerical languages. We treat arrays as
eagerly-memoized functions on typed index sets, allowing abstract function
manipulations, such as currying, to work on arrays. In contrast to composing
primitive bulk-array operations, we argue for an explicit nested indexing style
that mirrors application of functions to arguments. We also introduce a
fine-grained typed effects system which affords concise and
automatically-parallelized in-place updates. Specifically, an associative
accumulation effect allows reverse-mode automatic differentiation of in-place
updates in a way that preserves parallelism. Empirically, we benchmark against
the Futhark array programming language, and demonstrate that aggressive
inlining and type-driven compilation allows array programs to be written in an
expressive, "pointful" style with little performance penalty.Comment: 31 pages with appendix, 11 figures. A conference submission is still
under revie