86 research outputs found
Large scale numerical software development using functional languages
PhD ThesisFunctional programming languages such as Haskell allow numerical algorithms to be expressed in a
concise, machine-independent manner that closely reflects the underlying mathematical notation in
which the algorithm is described. Unfortunately the price paid for this level of abstraction is usually
a considerable increase in execution time and space usage.
This thesis presents a three-part study of the use of modern purely-functional languages to
develop numerical software.
In Part I the appropriateness and usefulness of language features such as polymorphism. pattern
matching, type-class overloading and non-strict semantics are discussed together with the
limitations they impose. Quantitative statistics concerning the manner in which these features
are used in practice are also presented.
In Part II the information gathered from Part I is used to design and implement FSC. all
experimental functional language tailored to numerical computing, motivated as much by
pragmatic as theoretical issues. This language is then used to develop numerical software and
its suitability assessed via benchmarking it against C/C++ and Haskell under various metrics.
In Part III the work is summarised and assessed.EPSRC
Parallelizing Julia with a Non-Invasive DSL
Computational scientists often prototype software using productivity
languages that offer high-level programming abstractions. When higher
performance is needed, they are obliged to rewrite their code in a
lower-level efficiency language. Different solutions have been
proposed to address this trade-off between productivity and
efficiency. One promising approach is to create embedded
domain-specific languages that sacrifice generality for productivity
and performance, but practical experience with DSLs points to some
road blocks preventing widespread adoption. This paper proposes a
non-invasive domain-specific language that makes as few visible
changes to the host programming model as possible. We present ParallelAccelerator,
a library and compiler for high-level, high-performance scientific
computing in Julia. ParallelAccelerator\u27s programming model is aligned with existing
Julia programming idioms. Our compiler exposes the implicit
parallelism in high-level array-style programs and compiles them to
fast, parallel native code. Programs can also run in "library-only"
mode, letting users benefit from the full Julia environment and
libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary
Cheap deforestation for non-strict functional languages
In functional languages intermediate data structures are often used as glue to
connect separate parts of a program together. Deforestation is the process
of automatically removing intermediate data structures. In this thesis we
present and analyse a new approach to deforestation. This new approach is
both practical and general.
We analyse in detail the problem of list removal rather than the more general
problem of arbitrary data structure removal. This more limited scope allows
a complete evaluation of the pragmatic aspects of using our deforestation
technology.
We have implemented our list deforestation algorithm in the Glasgow Haskell
compiler. Our implementation has allowed practical feedback. One important
conclusion is that a new analysis is required to infer function arities
and the linearity of lambda abstractions. This analysis renders the basic
deforestation algorithm far more effective.
We give a detailed assessment of our implementation of deforestation. We
measure the effectiveness of our deforestation on a suite of real application
programs. We also observe the costs of our deforestation algorithm
Compiling a domain specific language for dynamic programming
Steffen P. Compiling a domain specific language for dynamic programming. Bielefeld (Germany): Bielefeld University; 2006
Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming
We present a novel programming language design that attempts to combine the
clarity and safety of high-level functional languages with the efficiency and
parallelism of low-level numerical languages. We treat arrays as
eagerly-memoized functions on typed index sets, allowing abstract function
manipulations, such as currying, to work on arrays. In contrast to composing
primitive bulk-array operations, we argue for an explicit nested indexing style
that mirrors application of functions to arguments. We also introduce a
fine-grained typed effects system which affords concise and
automatically-parallelized in-place updates. Specifically, an associative
accumulation effect allows reverse-mode automatic differentiation of in-place
updates in a way that preserves parallelism. Empirically, we benchmark against
the Futhark array programming language, and demonstrate that aggressive
inlining and type-driven compilation allows array programs to be written in an
expressive, "pointful" style with little performance penalty.Comment: 31 pages with appendix, 11 figures. A conference submission is still
under revie
Supporting high-level, high-performance parallel programming with library-driven optimization
Parallel programming is a demanding task for developers partly because achieving scalable parallel speedup requires drawing upon a repertoire of complex, algorithm-specific, architecture-aware programming techniques. Ideally, developers of programming tools would be able to build algorithm-specific, high-level programming interfaces that hide the complex architecture-aware details. However, it is a monumental undertaking to develop such tools from scratch, and it is challenging to provide reusable functionality for developing such tools without sacrificing the hosted interface’s performance or ease of use. In particular, to get high performance on a cluster of multicore computers without requiring developers to manually place data and computation onto processors, it is necessary to combine prior methods for shared memory parallelism with new methods for algorithm-aware distribution of computation and data across the cluster.
This dissertation presents Triolet, a programming language and compiler for high-level programming of parallel loops for high-performance execution on clusters of multicore computers. Triolet adopts a simple, familiar programming interface based on traversing collections of data. By incorporating semantic knowledge of how traversals behave, Triolet achieves efficient parallel execution and communication. Moreover, Triolet’s performance on sequential loops is comparable to that of low-level C code, ranging from seven percent slower to 2.8× slower on tested benchmarks. Triolet’s design demonstrates that it is possible to decouple the design of a compiler from the implementation of parallelism without sacrificing performance or ease of use: parallel and sequential loops are implemented as library code and compiled to efficient code by an optimizing compiler that is unaware of parallelism beyond the scope of a single thread. All handling of parallel work partitioning, data partitioning, and scheduling is embodied in library code. During compilation, library code is inlined into a program and specialized to yield customized parallel loops. Experimental results from a 128-core cluster (with 8 nodes and 16 cores per node) show that loops in Triolet outperform loops in Eden, a similar high-level language. Triolet achieves significant parallel speedup over sequential C code, with performance ranging from slightly faster to 4.3× slower than manually parallelized C code on compute-intensive loops. Thus, Triolet demonstrates that a library of container traversal functions can deliver cluster-parallel performance comparable to manually parallelized C code without requiring programmers to manage parallelism. This programming approach opens the potential for future research into parallel programming frameworks
Scala-Virtualized: Linguistic Reuse for Deep Embeddings
Scala-Virtualized extends the Scala language to better support hosting embedded DSLs. Scala is an expressive language that provides a flexible syntax, type-level computation using implicits, and other features that facilitate the development of em- bedded DSLs. However, many of these features work well only for shallow embeddings, i.e. DSLs which are implemented as plain libraries. Shallow embeddings automatically profit from features of the host language through linguistic reuse: any DSL expression is just as a regular Scala expression. But in many cases, directly executing DSL programs within the host language is not enough and deep embeddings are needed, which reify DSL programs into a data structure representation that can be analyzed, optimized, or further translated. For deep embeddings, linguistic reuse is no longer automatic. Scala-Virtualized defines many of the language’s built-in constructs as method calls, which enables DSLs to redefine the built-in semantics using familiar language mechanisms like overloading and overriding. This in turn enables an easier progression from shallow to deep embeddings, as core language constructs such as conditionals or pattern matching can be redefined to build a reified representation of the operation itself. While this facility brings shallow, syntactic, reuse to deep embeddings, we also present examples of what we call deep linguistic reuse: combining shallow and deep components in a single DSL in such a way that certain features are fully implemented in the shallow embedding part and do not need to be reified at the deep embedding level
Just-In-Time Data Virtualization: Lightweight Data Management with ViDa
As the size of data and its heterogeneity increase, traditional database system architecture becomes an obstacle to data analysis. Integrating and ingesting (loading) data into databases is quickly becoming a bottleneck in face of massive data as well as increasingly heterogeneous data formats. Still, state-of-the-art approaches typically rely on copying and transforming data into one (or few) repositories. Queries, on the other hand, are often ad-hoc and supported by pre-cooked operators which are not adaptive enough to optimize access to data. As data formats and queries increasingly vary, there is a need to depart from the current status quo of static query processing primitives and build dynamic, fully adaptive architectures. We build ViDa, a system which reads data in its raw format and processes queries using adaptive, just-in-time operators. Our key insight is use of virtualization, i.e., abstracting data and manipulating it regardless of its original format, and dynamic generation of operators. ViDa's query engine is generated just-in-time; its caches and its query operators adapt to the current query and the workload, while also treating raw datasets as its native storage structures. Finally, ViDa features a language expressive enough to support heterogeneous data models, and to which existing languages can be translated. Users therefore have the power to choose the language best suited for an analysis
Spores: A Type-Based Foundation for Closures in the Age of Concurrency and Distribution
Functional programming (FP) is regularly touted as the way forward for bringing parallel, concurrent, and distributed programming to the mainstream. The popularity of the rationale behind this viewpoint (immutable data transformed by function application) has even lead to a number of object-oriented (OO) programming languages adopting functional features such as lambdas (functions) and thereby function closures. However, despite this established viewpoint of FP as an enabler, reliably distributing function closures over a network, or using them in concurrent environments nonetheless remains a challenge across FP and OO languages. This paper takes a step towards more principled distributed and concurrent programming by introducing a new closure-like abstraction and type system, called spores, that can guarantee closures to be serializable, thread-safe, or even have general, custom user-defined properties. Crucially, our system is based on the principle of encoding type information corresponding to captured variables in the type of a spore. We prove our type system sound, implement our approach for Scala, evaluate its practicality through an small empirical study, and show the power of these guarantees through a case analysis of real-world distributed and concurrent frameworks that this safe foundation for migratable closures facilitates
High-Level GPU Programming: Domain-Specific Optimization and Inference
When writing computer software one is often forced to balance the need for high run-time performance with high programmer productivity. By using a high-level language it is often possible to cut development times, but this typically comes at the cost of reduced run-time performance. Using a lower-level language, programs can be made very efficient but at the cost of increased development time. Real-time computer graphics is an area where there are very high demands on both performance and visual quality. Typically, large portions of such applications are written in lower-level languages and also rely on dedicated hardware, in the form of programmable graphics processing units (GPUs), for handling computationally demanding rendering algorithms. These GPUs are parallel stream processors, specialized towards computer graphics, that have computational performance more than a magnitude higher than corresponding CPUs. This has revolutionized computer graphics and also led to GPUs being used to solve more general numerical problems, such as fluid and physics simulation, protein folding, image processing, and databases. Unfortunately, the highly specialized nature of GPUs has also made them difficult to program. In this dissertation we show that GPUs can be programmed at a higher level, while maintaining performance, compared to current lower-level languages. By constructing a domain-specific language (DSL), which provides appropriate domain-specific abstractions and user-annotations, it is possible to write programs in a more abstract and modular manner. Using knowledge of the domain it is possible for the DSL compiler to generate very efficient code. We show that, by experiment, the performance of our DSLs is equal to that of GPU programs written by hand using current low-level languages. Also, control over the trade-offs between visual quality and performance is retained. In the papers included in this dissertation, we present domain-specific languages targeted at numerical processing and computer graphics, respectively. These DSL have been implemented as embedded languages in Python, a dynamic programming language that provide a rich set of high-level features. In this dissertation we show how these features can be used to facilitate the construction of embedded languages
- …