4 research outputs found

    Achieving High-Performance the Functional Way: A Functional Pearl on Expressing High-Performance Optimizations as Rewrite Strategies

    Get PDF
    Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications. The predominantly used imperative languages - like C or OpenCL - force the programmer to intertwine the code describing functionality and optimizations. This results in a portability nightmare that is particularly problematic given the accelerating trend towards specialized hardware devices to further increase efficiency. Many emerging DSLs used in performance demanding domains such as deep learning or high-performance image processing attempt to simplify or even fully automate the optimization process. Using a high-level - often functional - language, programmers focus on describing functionality in a declarative way. In some systems such as Halide or TVM, a separate schedule specifies how the program should be optimized. Unfortunately, these schedules are not written in well-defined programming languages. Instead, they are implemented as a set of ad-hoc predefined APIs that the compiler writers have exposed. In this functional pearl, we show how to employ functional programming techniques to solve this challenge with elegance. We present two functional languages that work together - each addressing a separate concern. RISE is a functional language for expressing computations using well known functional data-parallel patterns. ELEVATE is a functional language for describing optimization strategies. A high-level RISE program is transformed into a low-level form using optimization strategies written in ELEVATE . From the rewritten low-level program high-performance parallel code is automatically generated. In contrast to existing high-performance domain-specific systems with scheduling APIs, in our approach programmers are not restricted to a set of built-in operations and optimizations but freely define their own computational patterns in RISE and optimization strategies in ELEVATE in a composable and reusable way. We show how our holistic functional approach achieves competitive performance with the state-of-the-art imperative systems Halide and TVM

    Regression and Singular Value Decomposition in Dynamic Graphs

    Full text link
    Most of real-world graphs are {\em dynamic}, i.e., they change over time. However, while problems such as regression and Singular Value Decomposition (SVD) have been studied for {\em static} graphs, they have not been investigated for {\em dynamic} graphs, yet. In this paper, we introduce, motivate and study regression and SVD over dynamic graphs. First, we present the notion of {\em update-efficient matrix embedding} that defines the conditions sufficient for a matrix embedding to be used for the dynamic graph regression problem (under l2l_2 norm). We prove that given an n×mn \times m update-efficient matrix embedding (e.g., adjacency matrix), after an update operation in the graph, the optimal solution of the graph regression problem for the revised graph can be computed in O(nm)O(nm) time. We also study dynamic graph regression under least absolute deviation. Then, we characterize a class of matrix embeddings that can be used to efficiently update SVD of a dynamic graph. For adjacency matrix and Laplacian matrix, we study those graph update operations for which SVD (and low rank approximation) can be updated efficiently

    Polyhedral optimizations of RNA-RNA interaction computations

    Get PDF
    2017 Fall.Includes bibliographical references.Studying RNA-RNA interaction has led to major successes in the treatment of some cancers, including colon, breast and pancreatic cancer by suppressing the gene expression involved in the development of these diseases. The problem with such programs is that they are computationally and memory intensive: O(N4) space and O(N6) time complexity. Moreover, the entire application is complicated, and involves many mutually recursive data variables. We address the problem of speeding up a surrogate kernel (named OSPSQ) that captures the main dependence pattern found in two widely used RNA-RNA interaction applications IRIS and piRNA. The structure of the OSPSQ kernel perfectly fits the constraints of the polyhedral model, a well-developed technology for optimizing codes that belong to many specialized domains. However, the current state-of-the-art automatic polyhedral tools do not significantly improve the performance of the baseline implementation of OSPSQ. With simple techniques like loop permutation and skewing, we achieve an average of 17x sequential and 31x parallel speedup on a standard modern multi-core platform (Intel Broadwell, E5-1650v4). This performance represents 75% and 88% of attainable single-core and multi-core L1 bandwidth. For further performance improvement, we describe how to tile all six dimensions and also formulate the associated memory trade-off. In the future, we plan to implement these tiling strategies, explore the performance of the code for various tile sizes and optimize the whole piRNA application

    Proceedings of the 7th International Conference on PGAS Programming Models

    Get PDF
    corecore