6 research outputs found

    Automatic SIMD vectorization for Haskell

    Get PDF
    Expressing algorithms using immutable arrays greatly simplifies the challenges of automatic SIMD vectorization, since several important classes of dependency violations cannot occur. The Haskell programming language provides libraries for programming with immutable arrays, and compiler support for optimizing them to eliminate the overhead of intermediate temporary arrays. We describe an implementation of automatic SIMD vectorization in a Haskell compiler which gives substantial vector speedups for arange of programs written in a natural programming style. We compare performance with that of programs compiled by the Glasgow Haskell Compile

    GPRM: a high performance programming framework for manycore processors

    Get PDF
    Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity

    SAT Compilation for Constraints over Structured Finite Domains

    Get PDF
    A constraint is a formula in first-order logic expressing a relation between values of various domains. In order to solve a constraint, constructing a propositional encoding is a successfully applied technique that benefits from substantial progress made in the development of modern SAT solvers. However, propositional encodings are generally created by developing a problem-specific generator program or by crafting them manually, which often is a time-consuming and error-prone process especially for constraints over complex domains. Therefore, the present thesis introduces the constraint solver CO4 that automatically generates propositional encodings for constraints over structured finite domains written in a syntactical subset of the functional programming language Haskell. This subset of Haskell enables the specification of expressive and concise constraints by supporting user-defined algebraic data types, pattern matching, and polymorphic types, as well as higher-order and recursive functions. The constraint solver CO4 transforms a constraint written in this high-level language into a propositional formula. After an external SAT solver determined a satisfying assignment for the variables in the generated formula, a solution in the domain of discourse is derived. This approach is even applicable for finite restrictions of recursively defined algebraic data types. The present thesis describes all aspects of CO4 in detail: the language used for specifying constraints, the solving process and its correctness, as well as exemplary applications of CO4

    Profiling optimised Haskell - causal analysis and implementation

    Get PDF
    At the present time, performance optimisation of real-life Haskell programs is a bit of a “black art”. Programmers that can do so reliably are highly esteemed, doubly so if they manage to do it without sacrificing the character of the language by falling back to an “imperative style”. The reason is that while programming at a high-level does not need to result in slow performance, it must rely on a delicate mix of optimisations and transformations to work out just right. Predicting how all these cogs will turn is hard enough – but where something goes wrong, the various transformations will have mangled the program to the point where even finding the crucial locations in the code can become a game of cat-and-mouse. In this work we will lift the veil on the performance of heavily transformed Haskell programs: Using a formal causality analysis we will track source code links from square one, and maintain the connection all the way to the final costs generated by the program. This will allow us to implement a profiling solution that can measure performance at high accuracy while explaining in detail how we got to the point in question. Furthermore, we will directly support the performance analysis process by developing an interactive profiling user interface that allows rapid theory forming and evaluation, as well as deep analysis where required
    corecore