5 research outputs found

    A Systematic Survey of General Sparse Matrix-Matrix Multiplication

    Full text link
    SpGEMM (General Sparse Matrix-Matrix Multiplication) has attracted much attention from researchers in fields of multigrid methods and graph analysis. Many optimization techniques have been developed for certain application fields and computing architecture over the decades. The objective of this paper is to provide a structured and comprehensive overview of the research on SpGEMM. Existing optimization techniques have been grouped into different categories based on their target problems and architectures. Covered topics include SpGEMM applications, size prediction of result matrix, matrix partitioning and load balancing, result accumulating, and target architecture-oriented optimization. The rationales of different algorithms in each category are analyzed, and a wide range of SpGEMM algorithms are summarized. This survey sufficiently reveals the latest progress and research status of SpGEMM optimization from 1977 to 2019. More specifically, an experimentally comparative study of existing implementations on CPU and GPU is presented. Based on our findings, we highlight future research directions and how future studies can leverage our findings to encourage better design and implementation.Comment: 19 pages, 11 figures, 2 tables, 4 algorithm

    Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure (revised version)

    Get PDF
    Many domains of scientific simulation (chemistry, condensed matter physics, data science) increasingly eschew dense tensors for block-sparse tensors, sometimes with additional structure (recursive hierarchy, rank sparsity, etc.). Distributed-memory parallel computation with block-sparse tensorial data is paramount to minimize the time-tosolution (e.g., to study dynamical problems or for real-time analysis) and to accommodateproblems of realistic size that are too large to fit into the host/device memory of a single node equipped with accelerators. Unfortunately, computation with such irregular data structures is a poor match to the dominant imperative, bulk-synchronous parallel programming model. In this paper, we focus on the critical element of block-sparse tensor algebra, namely binary tensor contraction, and report on an efficient and scalable implementation using the task-focused PaRSEC runtime. High performance of the block-sparse tensor contraction on the Summit supercomputer is demonstrated for synthetic data as well as for real data involved in electronic structure simulations of unprecedented size.Les tenseurs creux par blocs (block-sparse) sont prĂ©sents dans de nombreux domaines scientifiques. Ce rapport Ă©tudie la parallĂ©lisation d’un noyau de contraction essentiel pour la manipulation de tels tenseurs, qui peut se matĂ©rialiser sous forme d’un produit de matrices C ← C + AB, oĂč les trois matrices ont une structure creuse par blocs, oĂč les tuiles de A et B sont de tailles hĂ©tĂ©rogĂšnes, et oĂč B est carrĂ©e de taille n, alors que A et C sont rectangulaires de taille m × n avec m << n. Nous proposons une implĂ©mentation sur la plate-forme Summit Ă  mĂ©moire distribuĂ©e, oĂč chaque nƓud est Ă©quipĂ© de plusieurs GPUs, au sein de l’environnement de tĂąches PaRSEC. Nous obtenons de bonnes performances pour des problĂšmes de taille inĂ©galĂ©es Ă  ce jour
    corecore