Search CORE

27 research outputs found

Implementing Push-Pull Efficiently in GraphBLAS

Author: Bader David A
Beamer Scott
Buluc Aydin
Buluç Aydin
Konig Denes
Rossi Ryan
Publication venue
Publication date: 20/06/2018
Field of study

We factor Beamer's push-pull, also known as direction-optimized breadth-first-search (DOBFS) into 3 separable optimizations, and analyze them for generalizability, asymptotic speedup, and contribution to overall speedup. We demonstrate that masking is critical for high performance and can be generalized to all graph algorithms where the sparsity pattern of the output is known a priori. We show that these graph algorithm optimizations, which together constitute DOBFS, can be neatly and separably described using linear algebra and can be expressed in the GraphBLAS linear-algebra-based framework. We provide experimental evidence that with these optimizations, a DOBFS expressed in a linear-algebra-based graph framework attains competitive performance with state-of-the-art graph frameworks on the GPU and on a multi-threaded CPU, achieving 101 GTEPS on a Scale 22 RMAT graph.Comment: 11 pages, 7 figures, International Conference on Parallel Processing (ICPP) 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments

Author: Adams M.
Aydin Buluç
Buluç A.
Buluç A.
Buluç A.
Buluç A.
Buluç A.
Chakrabarti D.
Chevalier C.
Davis T. A.
Erdös P.
Hendrickson B.
John R. Gilbert
Kaplan H.
Teng S.-H.
Yamazaki I.
Yuster R.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 26/04/2012
Field of study

Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient algorithms for general sparse-matrix indexing in distributed memory, provided that the underlying SpGEMM implementation is sufficiently flexible and scalable. We demonstrate that our parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memory-efficient in the general case. This algorithm is the first to yield increasing speedup on an unbounded number of processors; our experiments show scaling up to thousands of processors in a variety of test scenarios

arXiv.org e-Print Archive

Crossref

Automatic Generation of Efficient Sparse Tensor Format Conversion Routines

Author: Abstractions
Anandkumar Animashree
Bader Brett W.
Bik Aart JC
Buluç Aydin
Elafrou A.
Katherine Yelick Im
Kincaid David R.
Kotlyar Vladimir
Monakov Alexander
Nandy Payal
Park Jongsoo
Pugh William
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2020
Field of study

This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor's nonzeros in memory. This lets a compiler emit code that performs complex remappings of nonzeros when converting between formats. We also develop a query language that can extract statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of many formats without hard-coding for any specific combination. Our evaluation shows that the technique generates sparse tensor conversion routines with performance between 1.00 and 2.01

\times

that of hand-optimized versions in SPARSKIT and Intel MKL, two popular sparse linear algebra libraries. And by emitting code that avoids materializing temporaries, which both libraries need for many combinations of source and target formats, our technique outperforms those libraries by 1.78 to 4.01

\times

for CSC/COO to DIA/ELL conversion.Comment: Presented at PLDI 202

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Unraveling the functional dark matter through global metagenomics

Author: Acinas Silvia G.
Azad Ariful
Baker David
Baltoumas Fotis A.
Buluç Aydin
Call Lee
Camargo Antonio Pedro
Chen I. Min
Iliopoulos Ioannis
Ivanova Natalia N.
Karatzas Evangelos
Konstantinidis Konstantinos T.
Kyrpides Nikos C.
Liu Sirui
Nayfach Stephen
Novel Metagenome Protein Families Consortium
Ouzounis Christos
Ovchinnikov Sergey
Pavlopoulos Georgios A.
Pett-Ridge Jennifer
Páez-Espino A. David
Roux Simon
Selvitopi Oguz
Tiedje James M.
Visel Axel
Publication venue: Nature Publishing Group
Publication date: 01/10/2023
Field of study

30 pages, 4 figures, 1 table, supplementary information https://doi.org/10.1038/s41586-023-06583-7.-- Data availability: All of the analysed datasets along with their corresponding sequences are available from the IMG system (http://img.jgi.doe.gov/). A list of the datasets used in this study is provided in Supplementary Data 8. All data from the protein clusters, including sequences, multiple alignments, HMM profiles, 3D structure models, and taxonomic and ecosystem annotation, are available through NMPFamsDB, publicly accessible at www.nmpfamsdb.org. The 3D models are also available at ModelArchive under accession code ma-nmpfamsdb.-- Code availability: Sequence analysis was performed using Tantan (https://gitlab.com/mcfrith/tantan), BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi), LAST (https://gitlab.com/mcfrith/last), HMMER (http://hmmer.org/) and HH-suite3 (https://github.com/soedinglab/hh-suite). Clustering was performed using HipMCL (https://bitbucket.org/azadcse/hipmcl/src/master/). Additional taxonomic annotation was performed using Whokaryote (https://github.com/LottePronk/whokaryote), EukRep (https://github.com/patrickwest/EukRep), DeepVirFinder (https://github.com/jessieren/DeepVirFinder) and MMseqs2 (https://github.com/soedinglab/MMseqs2). 3D modelling was performed using AlphaFold2 (https://github.com/deepmind/alphafold) and TrRosetta2 (https://github.com/RosettaCommons/trRosetta2). Structural alignments were performed using TMalign (https://zhanggroup.org/TM-align/) and MMalign (https://zhanggroup.org/MM-align/). All custom scripts used for the generation and analysis of the data are available at Zenodo (https://doi.org/10.5281/zenodo.8097349)Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matterWith the institutional support of the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000928-S)Peer reviewe

Digital.CSIC

Recommended from our members

A work-efficient parallel sparse matrix-sparse vector multiplication algorithm.

Author: Azad Ariful
Buluç Aydin
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

eScholarship - University of California

A work-efficient parallel sparse matrix-sparse vector multiplication algorithm.

Author: Azad Ariful
Buluç Aydin
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the emerging GraphBLAS standard and is the workhorse of many graph algorithms including breadth-first search, bipartite graph matching, and maximal independent set. As thread counts increase, existing multithreaded SpMSpV algorithms can spend more time accessing the sparse matrix data structure than doing arithmetic. Our shared-memory parallel SpMSpV algorithm is work efficient in the sense its total work is proportional to the number of arithmetic operations required. The key insight is to avoid each thread individually scan the list of matrix columns. Our algorithm is simple to implement and operates on existing column-based sparse matrix formats. It performs well on diverse matrices and vectors with heterogeneous sparsity patterns. A high-performance implementation of the algorithm attains up to 15x speedup on a 24-core Intel Ivy Bridge processor and up to 49x speedup on a 64-core Intel KNL manycore processor. In contrast to implementations of existing algorithms, the performance of our algorithm is sustained on a variety of different input types include matrices representing scale-free and high-diameter graphs

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Highly Parallel Sparse Matrix-Matrix Multiplication

Author: Aydin Buluç
John
R. Gilbert
Publication venue
Publication date
Field of study

Abstract. Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient algorithms for general sparse-matrix indexing in distributed memory, provided that the underlying SpGEMM implementation is sufficiently flexible and scalable. We demonstrate that our parallel SpGEMM methods, which use two-dimensional block data distributions with serial hypersparse kernels, are indeed highly flexible, scalable, and memoryefficient in the general case. This algorithm is the first to yield increasing speedup on an unbounded number of processors; our experiments show scaling up to thousands of processors in a variety of test scenarios. Key words. parallel computing, numerical linear algebra, sparse matrix-matrix multiplication, SpGEMM, sparse matrix indexing, sparse matrix assignment, two-dimensional data decomposition, hypersparsity, graph algorithms, sparse SUMMA, subgraph extraction, graph contraction, graph batch updat

CiteSeerX

Recommended from our members

Scaling Deep Learning on GPU and Knights Landing clusters.

Author: Buluç Aydin
Demmel James
You Yang
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

eScholarship - University of California

Recommended from our members

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

Author: Azad Ariful
Buluç Aydin
Hussain Md Taufique
Selvitopi Oguz
Publication venue: eScholarship, University of California
Publication date: 16/10/2020
Field of study

Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. In this paper, we consider SpGEMMs performed on hundreds of thousands of processors generating trillions of nonzeros in the output matrix. Distributed SpGEMM at this extreme scale faces two key challenges: (1) high communication cost and (2) inadequate memory to generate the output. We address these challenges with an integrated communication-avoiding and memory-constrained SpGEMM algorithm that scales to 262,144 cores (more than 1 million hardware threads) and can multiply sparse matrices of any size as long as inputs and a fraction of output fit in the aggregated memory. As we go from 16,384 cores to 262,144 cores on a Cray XC40 supercomputer, the new SpGEMM algorithm runs 10x faster when multiplying large-scale protein-similarity matrices

eScholarship - University of California