6 research outputs found

    Fast M\"obius and Zeta Transforms

    Full text link
    M\"obius inversion of functions on partially ordered sets (posets) P\mathcal{P} is a classical tool in combinatorics. For finite posets it consists of two, mutually inverse, linear transformations called zeta and M\"obius transform, respectively. In this paper we provide novel fast algorithms for both that require O(nk)O(nk) time and space, where n=∣P∣n = |\mathcal{P}| and kk is the width (length of longest antichain) of P\mathcal{P}, compared to O(n2)O(n^2) for a direct computation. Our approach assumes that P\mathcal{P} is given as directed acyclic graph (DAG) (E,P)(\mathcal{E}, \mathcal{P}). The algorithms are then constructed using a chain decomposition for a one time cost of O(∣E∣+∣Ered∣k)O(|\mathcal{E}| + |\mathcal{E}_\text{red}| k), where Ered\mathcal{E}_\text{red} is the number of edges in the DAG's transitive reduction. We show benchmarks with implementations of all algorithms including parallelized versions. The results show that our algorithms enable M\"obius inversion on posets with millions of nodes in seconds if the defining DAGs are sufficiently sparse.Comment: 16 pages, 7 figures, submitted for revie

    Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver

    No full text

    A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication

    Get PDF
    The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially and behaves in accordance with the Roofline model. Outliers are discussed and analyzed in detail. While we focus on SymmSpMV in this paper, our algorithm and software is applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring

    Doctor of Philosophy

    Get PDF
    dissertationSparse matrix codes are found in numerous applications ranging from iterative numerical solvers to graph analytics. Achieving high performance on these codes has however been a significant challenge, mainly due to array access indirection, for example, of the form A[B[i]]. Indirect accesses make precise dependence analysis impossible at compile-time, and hence prevent many parallelizing and locality optimizing transformations from being applied. The expert user relies on manually written libraries to tailor the sparse code and data representations best suited to the target architecture from a general sparse matrix representation. However libraries have limited composability, address very specific optimization strategies, and have to be rewritten as new architectures emerge. In this dissertation, we explore the use of the inspector/executor methodology to accomplish the code and data transformations to tailor high performance sparse matrix representations. We devise and embed abstractions for such inspector/executor transformations within a compiler framework so that they can be composed with a rich set of existing polyhedral compiler transformations to derive complex transformation sequences for high performance. We demonstrate the automatic generation of inspector/executor code, which orchestrates code and data transformations to derive high performance representations for the Sparse Matrix Vector Multiply kernel in particular. We also show how the same transformations may be integrated into sparse matrix and graph applications such as Sparse Matrix Matrix Multiply and Stochastic Gradient Descent, respectively. The specific constraints of these applications, such as problem size and dependence structure, necessitate unique sparse matrix representations that can be realized using our transformations. Computations such as Gauss Seidel, with loop carried dependences at the outer most loop necessitate different strategies for high performance. Specifically, we organize the computation into level sets or wavefronts of irregular size, such that iterations of a wavefront may be scheduled in parallel but different wavefronts have to be synchronized. We demonstrate automatic code generation of high performance inspectors that do explicit dependence testing and level set construction at runtime, as well as high performance executors, which are the actual parallelized computations. For the above sparse matrix applications, we automatically generate inspector/executor code comparable in performance to manually tuned libraries
    corecore