22 research outputs found

    Type-Inference Based Short Cut Deforestation (nearly) without Inlining

    Get PDF
    Deforestation optimises a functional program by transforming it into another one that does not create certain intermediate data structures. In [ICFP'99] we presented a type-inference based deforestation algorithm which performs extensive inlining. However, across module boundaries only limited inlining is practically feasible. Furthermore, inlining is a non-trivial transformation which is therefore best implemented as a separate optimisation pass. To perform short cut deforestation (nearly) without inlining, Gill suggested to split definitions into workers and wrappers and inline only the small wrappers, which transfer the information needed for deforestation. We show that Gill's use of a function build limits deforestation and note that his reasons for using build do not apply to our approach. Hence we develop a more general worker/wrapper scheme without build. We give a type-inference based algorithm which splits definitions into workers and wrappers. Finally, we show that we can deforest more expressions with the worker/wrapper scheme than the algorithm with inlining

    Catamorphism-based program transformations for non-strict functional languages

    Get PDF
    In functional languages intermediate data structures are often used as glue to connect separate parts of a program together. These intermediate data structures are useful because they allow modularity, but they are also a cause of inefficiency: each element need to be allocated, to be examined, and to be deallocated. Warm fusion is a program transformation technique which aims to eliminate intermediate data structures. Functions in a program are first transformed into the so called build-cata form, then fused via a one-step rewrite rule, the cata-build rule. In the process of the transformation to build-cata form we attempt to replace explicit recursion with a fixed pattern of recursion (catamorphism). We analyse in detail the problem of removing - possibly mutually recursive sets of - polynomial datatypes. Wehave implemented the warm fusion method in the Glasgow Haskell Compiler, which has allowed practical feedback. One important conclusion is that catamorphisms and fusion in general deserve a more prominent role in the compilation process. We give a detailed measurement of our implementation on a suite of real application programs

    Modular, higher order cardinality analysis in theory and practice

    Get PDF
    Since the mid '80s, compiler writers for functional languages (especially lazy ones) have been writing papers about identifying and exploiting thunks and lambdas that are used only once. However, it has proved difficult to achieve both power and simplicity in practice. In this paper, we describe a new, modular analysis for a higher order language, which is both simple and effective. We prove the analysis sound with respect to a standard call-by-need semantics, and present measurements of its use in a full-scale, state-of-the-art optimising compiler. The analysis finds many single-entry thunks and one-shot lambdas and enables a number of program optimisations. This paper extends our preceding conference publication (Sergey et al. 2014 Proceedings of the 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2014). ACM, pp. 335–348) with proofs, expanded report on evaluation and a detailed examination of the factors causing the loss of precision in the analysis

    Hfusion : a fusion tool based on acid rain plus extensions

    Get PDF
    When constructing programs, it is a usual practice to compose algorithms that solve simpler problems to solve a more complex one. This principle adapts so well to software development because it provides a structure to understand, design, reuse and test programs. In functional languages, algorithms are usually connected through the use of intermediate data structures, which carry the data from one algorithm to another one. The data structures impose a load on the algorithms to allocate, traverse and deallocate them. To alleviate this ine ciency, automatic program transformations have been studied, which produce equivalent programs that make less use of intermediate data structures. We present a set of automatic program transformation techniques based on algebraic laws known as Acid Rain. These techniques allow to remove intermediate data structures in programs containing primitive recursive functions, mutually recursive functions and functions with multiple recursive arguments. We also provide an experimental implementation of our techniques which allows their application on user supplied programs

    Liveness-Based Garbage Collection for Lazy Languages

    Full text link
    We consider the problem of reducing the memory required to run lazy first-order functional programs. Our approach is to analyze programs for liveness of heap-allocated data. The result of the analysis is used to preserve only live data---a subset of reachable data---during garbage collection. The result is an increase in the garbage reclaimed and a reduction in the peak memory requirement of programs. While this technique has already been shown to yield benefits for eager first-order languages, the lack of a statically determinable execution order and the presence of closures pose new challenges for lazy languages. These require changes both in the liveness analysis itself and in the design of the garbage collector. To show the effectiveness of our method, we implemented a copying collector that uses the results of the liveness analysis to preserve live objects, both evaluated (i.e., in WHNF) and closures. Our experiments confirm that for programs running with a liveness-based garbage collector, there is a significant decrease in peak memory requirements. In addition, a sizable reduction in the number of collections ensures that in spite of using a more complex garbage collector, the execution times of programs running with liveness and reachability-based collectors remain comparable

    Hybrid eager and lazy evaluation for efficient compilation of Haskell

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 208-220).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.The advantage of a non-strict, purely functional language such as Haskell lies in its clean equational semantics. However, lazy implementations of Haskell fall short: they cannot express tail recursion gracefully without annotation. We describe resource-bounded hybrid evaluation, a mixture of strict and lazy evaluation, and its realization in Eager Haskell. From the programmer's perspective, Eager Haskell is simply another implementation of Haskell with the same clean equational semantics. Iteration can be expressed using tail recursion, without the need to resort to program annotations. Under hybrid evaluation, computations are ordinarily executed in program order just as in a strict functional language. When particular stack, heap, or time bounds are exceeded, suspensions are generated for all outstanding computations. These suspensions are re-started in a demand-driven fashion from the root. The Eager Haskell compiler translates Ac, the compiler's intermediate representation, to efficient C code. We use an equational semantics for Ac to develop simple correctness proofs for program transformations, and connect actions in the run-time system to steps in the hybrid evaluation strategy.(cont.) The focus of compilation is efficiency in the common case of straight-line execution; the handling of non-strictness and suspension are left to the run-time system. Several additional contributions have resulted from the implementation of hybrid evaluation. Eager Haskell is the first eager compiler to use a call stack. Our generational garbage collector uses this stack as an additional predictor of object lifetime. Objects above a stack watermark are assumed to be likely to die; we avoid promoting them. Those below are likely to remain untouched and therefore are good candidates for promotion. To avoid eagerly evaluating error checks, they are compiled into special bottom thunks, which are treated specially by the run-time system. The compiler identifies error handling code using a mixture of strictness and type information. This information is also used to avoid inlining error handlers, and to enable aggressive program transformation in the presence of error handling.by Jan-Willem Maessen.Ph.D

    HERMIT: Mechanized Reasoning during Compilation in the Glasgow Haskell Compiler

    Get PDF
    It is difficult to write programs which are both correct and fast. A promising approach, functional programming, is based on the idea of using pure, mathematical functions to construct programs. With effort, it is possible to establish a connection between a specification written in a functional language, which has been proven correct, and a fast implementation, via program transformation. When practiced in the functional programming community, this style of reasoning is still typically performed by hand, by either modifying the source code or using pen-and-paper. Unfortunately, performing such semi-formal reasoning by directly modifying the source code often obfuscates the program, and pen-and-paper reasoning becomes outdated as the program changes over time. Even so, this semi-formal reasoning prevails because formal reasoning is time-consuming, and requires considerable expertise. Formal reasoning tools often only work for a subset of the target language, or require programs to be implemented in a custom language for reasoning. This dissertation investigates a solution, called HERMIT, which mechanizes reasoning during compilation. HERMIT can be used to prove properties about programs written in the Haskell functional programming language, or transform them to improve their performance. Reasoning in HERMIT proceeds in a style familiar to practitioners of pen-and-paper reasoning, and mechanization allows these techniques to be applied to real-world programs with greater confidence. HERMIT can also re-check recorded reasoning steps on subsequent compilations, enforcing a connection with the program as the program is developed. HERMIT is the first system capable of directly reasoning about the full Haskell language. The design and implementation of HERMIT, motivated both by typical reasoning tasks and HERMIT's place in the Haskell ecosystem, is presented in detail. Three case studies investigate HERMIT's capability to reason in practice. These case studies demonstrate that semi-formal reasoning with HERMIT lowers the barrier to writing programs which are both correct and fast

    Supporting high-level, high-performance parallel programming with library-driven optimization

    Get PDF
    Parallel programming is a demanding task for developers partly because achieving scalable parallel speedup requires drawing upon a repertoire of complex, algorithm-specific, architecture-aware programming techniques. Ideally, developers of programming tools would be able to build algorithm-specific, high-level programming interfaces that hide the complex architecture-aware details. However, it is a monumental undertaking to develop such tools from scratch, and it is challenging to provide reusable functionality for developing such tools without sacrificing the hosted interface’s performance or ease of use. In particular, to get high performance on a cluster of multicore computers without requiring developers to manually place data and computation onto processors, it is necessary to combine prior methods for shared memory parallelism with new methods for algorithm-aware distribution of computation and data across the cluster. This dissertation presents Triolet, a programming language and compiler for high-level programming of parallel loops for high-performance execution on clusters of multicore computers. Triolet adopts a simple, familiar programming interface based on traversing collections of data. By incorporating semantic knowledge of how traversals behave, Triolet achieves efficient parallel execution and communication. Moreover, Triolet’s performance on sequential loops is comparable to that of low-level C code, ranging from seven percent slower to 2.8× slower on tested benchmarks. Triolet’s design demonstrates that it is possible to decouple the design of a compiler from the implementation of parallelism without sacrificing performance or ease of use: parallel and sequential loops are implemented as library code and compiled to efficient code by an optimizing compiler that is unaware of parallelism beyond the scope of a single thread. All handling of parallel work partitioning, data partitioning, and scheduling is embodied in library code. During compilation, library code is inlined into a program and specialized to yield customized parallel loops. Experimental results from a 128-core cluster (with 8 nodes and 16 cores per node) show that loops in Triolet outperform loops in Eden, a similar high-level language. Triolet achieves significant parallel speedup over sequential C code, with performance ranging from slightly faster to 4.3× slower than manually parallelized C code on compute-intensive loops. Thus, Triolet demonstrates that a library of container traversal functions can deliver cluster-parallel performance comparable to manually parallelized C code without requiring programmers to manage parallelism. This programming approach opens the potential for future research into parallel programming frameworks

    Program Analysis and Compilation Techniques for Speeding up Transactional Database Workloads

    Get PDF
    There is a trend towards increased specialization of data management software for performance reasons. The improved performance not only leads to a more efficient usage of the underlying hardware and cuts the operation costs of the system, but also is a game-changing competitive advantage for many emerging application domains such as high-frequency algorithmic trading, clickstream analysis, infrastructure monitoring, fraud detection, and online advertising to name a few. In this thesis, we study the automatic specialization and optimization of database application programs -- sequences of queries and updates, augmented with control flow constructs as they appear in database scripts, user-defined functions (UDFs), transactional workloads and triggers in languages such as PL/SQL. We propose to build online transaction processing (OLTP) systems around a modern compiler infrastructure. We show how to build an optimizing compiler for transaction programs using generative programming and state-of-the-art compiler technology, and present techniques for aggressive code inlining, fusion, deforestation, and data structure specialization in the domain of relational transaction programs. We also identify and explore the key optimizations that can be applied in this domain. In addition, we study the advantage of using program dependency analysis and restructuring to enable the concurrency control algorithms to achieve higher performance. Traditionally, optimistic concurrency control algorithms, such as optimistic Multi-Version Concurrency Control (MVCC), avoid blocking concurrent transactions at the cost of having a validation phase. Upon failure in the validation phase, the transaction is usually aborted and restarted from scratch. The "abort and restart" approach becomes a performance bottleneck for use cases with high contention objects or long running transactions. In addition, restarting from scratch creates a negative feedback loop in the system, because the system incurs additional overhead that may create even more conflicts. However, using the dependency information inside the transaction programs, we propose a novel transaction repair approach for in-memory databases. This low overhead approach summarizes the transaction programs in the form of a dependency graph. The dependency graph also contains the constructs used in the validation phase of the MVCC algorithm. Then, when encountering conflicts among transactions, our mechanism quickly detects the conflict locations in the program and partially re-executes the conflicting transactions. This approach maximizes the reuse of the computations done in the first execution round and increases the transaction processing throughput. We evaluate the proposed ideas and techniques in the thesis on some popular benchmarks such as TPC-C and modified versions of TPC-H and TPC-E, as well as other micro-benchmarks. We show that applying these techniques leads to 2x-100x performance improvement in many use cases. Besides, by selectively disabling some of the optimizations in the compiler, we derive a clinical and precise way of obtaining insight into their individual performance contributions
    corecore