3,209 research outputs found
Array operators using multiple dispatch: a design methodology for array implementations in dynamic languages
Arrays are such a rich and fundamental data type that they tend to be built
into a language, either in the compiler or in a large low-level library.
Defining this functionality at the user level instead provides greater
flexibility for application domains not envisioned by the language designer.
Only a few languages, such as C++ and Haskell, provide the necessary power to
define -dimensional arrays, but these systems rely on compile-time
abstraction, sacrificing some flexibility. In contrast, dynamic languages make
it straightforward for the user to define any behavior they might want, but at
the possible expense of performance.
As part of the Julia language project, we have developed an approach that
yields a novel trade-off between flexibility and compile-time analysis. The
core abstraction we use is multiple dispatch. We have come to believe that
while multiple dispatch has not been especially popular in most kinds of
programming, technical computing is its killer application. By expressing key
functions such as array indexing using multi-method signatures, a surprising
range of behaviors can be obtained, in a way that is both relatively easy to
write and amenable to compiler analysis. The compact factoring of concerns
provided by these methods makes it easier for user-defined types to behave
consistently with types in the standard library.Comment: 6 pages, 2 figures, workshop paper for the ARRAY '14 workshop, June
11, 2014, Edinburgh, United Kingdo
Extensible sparse functional arrays with circuit parallelism
A longstanding open question in algorithms and data structures is the time and space complexity of pure functional arrays. Imperative arrays provide update and lookup operations that require constant time in the RAM theoretical model, but it is conjectured that there does not exist a RAM algorithm that achieves the same complexity for functional arrays, unless restrictions are placed on the operations. The main result of this paper is an algorithm that does achieve optimal unit time and space complexity for update and lookup on functional arrays. This algorithm does not run on a RAM, but instead it exploits the massive parallelism inherent in digital circuits. The algorithm also provides unit time operations that support storage management, as well as sparse and extensible arrays. The main idea behind the algorithm is to replace a RAM memory by a tree circuit that is more powerful than the RAM yet has the same asymptotic complexity in time (gate delays) and size (number of components). The algorithm uses an array representation that allows elements to be shared between many arrays with only a small constant factor penalty in space and time. This system exemplifies circuit parallelism, which exploits very large numbers of transistors per chip in order to speed up key algorithms. Extensible Sparse Functional Arrays (ESFA) can be used with both functional and imperative programming languages. The system comprises a set of algorithms and a circuit specification, and it has been implemented on a GPGPU with good performance
Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations
We propose Paraiso, a domain specific language embedded in functional
programming language Haskell, for automated tuning of explicit solvers of
partial differential equations (PDEs) on GPUs as well as multicore CPUs. In
Paraiso, one can describe PDE solving algorithms succinctly using tensor
equations notation. Hydrodynamic properties, interpolation methods and other
building blocks are described in abstract, modular, re-usable and combinable
forms, which lets us generate versatile solvers from little set of Paraiso
source codes.
We demonstrate Paraiso by implementing a compressive hydrodynamics solver. A
single source code less than 500 lines can be used to generate solvers of
arbitrary dimensions, for both multicore CPUs and GPUs. We demonstrate both
manual annotation based tuning and evolutionary computing based automated
tuning of the program.Comment: 52 pages, 14 figures, accepted for publications in Computational
Science and Discover
Mainstream parallel array programming on cell
We present the E] compiler and runtime library for the ‘F’ subset of
the Fortran 95 programming language. ‘F’ provides first-class support for arrays,
allowing E] to implicitly evaluate array expressions in parallel using the SPU coprocessors
of the Cell Broadband Engine. We present performance results from
four benchmarks that all demonstrate absolute speedups over equivalent ‘C’ or
Fortran versions running on the PPU host processor. A significant benefit of this
straightforward approach is that a serial implementation of any code is always
available, providing code longevity, and a familiar development paradigm
Stream Fusion, to Completeness
Stream processing is mainstream (again): Widely-used stream libraries are now
available for virtually all modern OO and functional languages, from Java to C#
to Scala to OCaml to Haskell. Yet expressivity and performance are still
lacking. For instance, the popular, well-optimized Java 8 streams do not
support the zip operator and are still an order of magnitude slower than
hand-written loops. We present the first approach that represents the full
generality of stream processing and eliminates overheads, via the use of
staging. It is based on an unusually rich semantic model of stream interaction.
We support any combination of zipping, nesting (or flat-mapping), sub-ranging,
filtering, mapping-of finite or infinite streams. Our model captures
idiosyncrasies that a programmer uses in optimizing stream pipelines, such as
rate differences and the choice of a "for" vs. "while" loops. Our approach
delivers hand-written-like code, but automatically. It explicitly avoids the
reliance on black-box optimizers and sufficiently-smart compilers, offering
highest, guaranteed and portable performance. Our approach relies on high-level
concepts that are then readily mapped into an implementation. Accordingly, we
have two distinct implementations: an OCaml stream library, staged via
MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we
derive libraries richer and simultaneously many tens of times faster than past
work. We greatly exceed in performance the standard stream libraries available
in Java, Scala and OCaml, including the well-optimized Java 8 streams
- …