612 research outputs found

    Common Subexpression Elimination in a Lazy Functional Language

    Get PDF
    Common subexpression elimination is a well-known compiler optimisation that saves time by avoiding the repetition of the same computation. To our knowledge it has not yet been applied to lazy functional programming languages, although there are several advantages. First, the referential transparency of these languages makes the identification of common subexpressions very simple. Second, more common subexpressions can be recognised because they can be of arbitrary type whereas standard common subexpression elimination only shares primitive values. However, because lazy functional languages decouple program structure from data space allocation and control flow, analysing its effects and deciding under which conditions the elimination of a common subexpression is beneficial proves to be quite difficult. We developed and implemented the transformation for the language Haskell by extending the Glasgow Haskell compiler and measured its effectiveness on real-world programs

    Towards an Adaptive Skeleton Framework for Performance Portability

    Get PDF
    The proliferation of widely available, but very different, parallel architectures makes the ability to deliver good parallel performance on a range of architectures, or performance portability, highly desirable. Irregularly-parallel problems, where the number and size of tasks is unpredictable, are particularly challenging and require dynamic coordination. The paper outlines a novel approach to delivering portable parallel performance for irregularly parallel programs. The approach combines declarative parallelism with JIT technology, dynamic scheduling, and dynamic transformation. We present the design of an adaptive skeleton library, with a task graph implementation, JIT trace costing, and adaptive transformations. We outline the architecture of the protoype adaptive skeleton execution framework in Pycket, describing tasks, serialisation, and the current scheduler.We report a preliminary evaluation of the prototype framework using 4 micro-benchmarks and a small case study on two NUMA servers (24 and 96 cores) and a small cluster (17 hosts, 272 cores). Key results include Pycket delivering good sequential performance e.g. almost as fast as C for some benchmarks; good absolute speedups on all architectures (up to 120 on 128 cores for sumEuler); and that the adaptive transformations do improve performance

    Realising nondeterministic I/O in the Glasgow Haskell Compiler

    Get PDF
    In this paper we demonstrate how to relate the semantics given by the nondeterministic call-by-need calculus FUNDIO [SS03] to Haskell. After introducing new correct program transformations for FUNDIO, we translate the core language used in the Glasgow Haskell Compiler into the FUNDIO language, where the IO construct of FUNDIO corresponds to direct-call IO-actions in Haskell. We sketch the investigations of [Sab03b] where a lot of program transformations performed by the compiler have been shown to be correct w.r.t. the FUNDIO semantics. This enabled us to achieve a FUNDIO-compatible Haskell-compiler, by turning o not yet investigated transformations and the small set of incompatible transformations. With this compiler, Haskell programs which use the extension unsafePerformIO in arbitrary contexts, can be compiled in a "safe" manner

    Benchmarking Implementations of Functional Languages with ``Pseudoknot'', a Float-Intensive Benchmark

    Get PDF
    Over 25 implementations of different functional languages are benchmarked using the same program, a floatingpoint intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations that were benchmarked. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation.\ud With few exceptions, the compilers take a significant amount of time to compile this program, though most compilers were faster than the then current GNU C compiler (GCC version 2.5.8). Compilers that generate C or Lisp are often slower than those that generate native code directly: the cost of compiling the intermediate form is normally a large fraction of the total compilation time.\ud There is no clear distinction between the runtime performance of eager and lazy implementations when appropriate annotations are used: lazy implementations have clearly come of age when it comes to implementing largely strict applications, such as the Pseudoknot program. The speed of C can be approached by some implemtations, but to achieve this performance, special measures such as strictness annotations are required by non-strict implementations.\ud The benchmark results have to be interpreted with care. Firstly, a benchmark based on a single program cannot cover a wide spectrum of 'typical' applications.j Secondly, the compilers vary in the kind and level of optimisations offered, so the effort required to obtain an optimal version of the program is similarly varied

    Benchmarking implementations of functional languages with ā€˜Pseudoknot', a float-intensive benchmark

    Get PDF
    Over 25 implementations of different functional languages are benchmarked using the same program, a floating-point intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations that were benchmarked. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation. With few exceptions, the compilers take a significant amount of time to compile this program, though most compilers were faster than the then current GNU C compiler (GCC version 2.5.8). Compilers that generate C or Lisp are often slower than those that generate native code directly: the cost of compiling the intermediate form is normally a large fraction of the total compilation time. There is no clear distinction between the runtime performance of eager and lazy implementations when appropriate annotations are used: lazy implementations have clearly come of age when it comes to implementing largely strict applications, such as the Pseudoknot program. The speed of C can be approached by some implementations, but to achieve this performance, special measures such as strictness annotations are required by non-strict implementations. The benchmark results have to be interpreted with care. Firstly, a benchmark based on a single program cannot cover a wide spectrum of ā€˜typical' applications. Secondly, the compilers vary in the kind and level of optimisations offered, so the effort required to obtain an optimal version of the program is similarly varie

    Tracing and Debugging of Lazy Functional Programs - A Comparative Evaluation of Three Systems

    Get PDF
    In this paper we compare three systems for tracing and debugging Haskell programs: Freja, the Redex Trail System and Hood. We identify the similarities and differences of these systems and we evaluate their usefulness in practice by applying them to a number of small to medium programs in which errors had deliberately been introduced

    Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs

    Get PDF
    In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain

    A Reference Interpreter for the Graph Programming Language GP 2

    Get PDF
    GP 2 is an experimental programming language for computing by graph transformation. An initial interpreter for GP 2, written in the functional language Haskell, provides a concise and simply structured reference implementation. Despite its simplicity, the performance of the interpreter is sufficient for the comparative investigation of a range of test programs. It also provides a platform for the development of more sophisticated implementations.Comment: In Proceedings GaM 2015, arXiv:1504.0244

    Profiling large-scale lazy functional programs

    Get PDF
    The LOLITA natural language processing system is an example of one of the ever increasing number of large-scale systems written entirely in a functional programming language. The system consists of over 50,000 lines of Haskell code and is able to perform a number of tasks such as semantic and pragmatic analysis of text, context scanning and query analysis. Such a system is more useful if the results are calculated in real-time, therefore the efficiency of such a system is paramount. For the past three years we have used profiling tools supplied with the Haskell compilers GHC and HBC to analyse and reason about our programming solutions and have achieved good results; however, our experience has shown that the profiling life-cycle is often too long to make a detailed analysis of a large system possible, and the profiling results are often misleading. A profiling system is developed which allows three types of functionality not previously found in a profiler for lazy functional programs. Firstly, the profiler is able to produce results based on an accurate method of cost inheritance. We have found that this reduces the possibility of the programmer obtaining misleading profiling results. Secondly, the programmer is able to explore the results after the execution of the program. This is done by selecting and deselecting parts of the program using a post-processor. This greatly reduces the analysis time as no further compilation, execution or profiling of the program is needed. Finally, the new profiling system allows the user to examine aspects of the run-time call structure of the program. This is useful in the analysis of the run-time behaviour of the program. Previous attempts at extending the results produced by a profiler in such a way have failed due to the exceptionally high overheads. Exploration of the overheads produced by the new profiling scheme show that typical overheads in profiling the LOLITA system are: a 10% increase in compilation time; a 7% increase in executable size and a 70% run-time overhead. These overheads mean a considerable saving in time in the detailed analysis of profiling a large, lazy functional program
    • ā€¦
    corecore