Search CORE

10 research outputs found

Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis

Author: Cheshmi Kazem
Dehnavi Maryam Mehri
Kamil Shoaib
Strout Michelle Mills
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/05/2017
Field of study

Sympiler is a domain-specific code generator that optimizes sparse matrix computations by decoupling the symbolic analysis phase from the numerical manipulation stage in sparse codes. The computation patterns in sparse numerical methods are guided by the input sparsity structure and the sparse algorithm itself. In many real-world simulations, the sparsity pattern changes little or not at all. Sympiler takes advantage of these properties to symbolically analyze sparse codes at compile-time and to apply inspector-guided transformations that enable applying low-level transformations to sparse codes. As a result, the Sympiler-generated code outperforms highly-optimized matrix factorization codes from commonly-used specialized libraries, obtaining average speedups over Eigen and CHOLMOD of 3.8X and 1.5X respectively.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Automatic Generation of Efficient Sparse Tensor Format Conversion Routines

Author: Abstractions
Anandkumar Animashree
Bader Brett W.
Bik Aart JC
Buluç Aydin
Elafrou A.
Katherine Yelick Im
Kincaid David R.
Kotlyar Vladimir
Monakov Alexander
Nandy Payal
Park Jongsoo
Pugh William
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2020
Field of study

This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor's nonzeros in memory. This lets a compiler emit code that performs complex remappings of nonzeros when converting between formats. We also develop a query language that can extract statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of many formats without hard-coding for any specific combination. Our evaluation shows that the technique generates sparse tensor conversion routines with performance between 1.00 and 2.01

\times

that of hand-optimized versions in SPARSKIT and Intel MKL, two popular sparse linear algebra libraries. And by emitting code that avoids materializing temporaries, which both libraries need for many combinations of source and target formats, our technique outperforms those libraries by 1.78 to 4.01

\times

for CSC/COO to DIA/ELL conversion.Comment: Presented at PLDI 202

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Scheduling Transformation and Dependence Tests for Recursive Programs

Author: Kulkarni Milind
Sundararajah Kirshanthan
Publication venue: 'Purdue University (bepress)'
Publication date: 01/11/2018
Field of study

Scheduling transformations reorder the execution of operations in a program to improve locality and/or parallelism. The polyhedral model provides a general framework for performing instance-wise scheduling transformations for regular programs, reordering the iterations of loops that operate over dense arrays through transformations like tiling. There is no analogous framework for recursive programs—despite recent interest in optimizations like tiling and fusion for recursive applications. This paper presents PolyRec, the first general framework for applying scheduling transformations—like inlining, interchange, and code motion—to nested recursive programs and reasoning about their correctness. We describe the phases of PolyRec—representing dynamic instances, applying transformations, reasoning about correctness—and show that PolyRec is able to apply sophisticated, composed transformations to complex, nested recursive programs and improve performance through enhanced locality

Purdue E-Pubs

SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring

Author: Anderson Logan
Dias Adhitha
Kulkarni Milind
Pelenitsyn Artem
Sundararajah Kirshanthan
Publication venue
Publication date: 05/01/2024
Field of study

Automated code generation and performance optimizations for sparse tensor algebra are cardinal since they have become essential in many real-world applications like quantum computing, physics, chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to optimize and generate asymptotically better schedules for complex tensor expressions using kernel fission and fusion. We present a generalized loop transformation to achieve loop nesting for minimized memory footprint and reduced asymptotic complexity. Furthermore, we present an auto-scheduler that uses a partially ordered set-based cost model that uses both time and auxiliary memory complexities in its pruning stages. In addition, we highlight the use of SMT solvers in sparse auto-schedulers to prune the Pareto frontier of schedules to the smallest number of possible schedules with user-defined constraints available at compile time. Finally, we show that our auto-scheduler can select asymptotically better schedules that use our compiler transformation to generate optimized code. Our results show that the auto-scheduler achieves orders of magnitude speedup compared to the TACO-generated code for several real-world tensor algebra computations on different real-world inputs

arXiv.org e-Print Archive

A polyhedral compilation framework for loops with dynamic data-dependent bounds

Author: Cohen Albert
Kruse Michael
Zhao Jie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2018
Field of study

International audienceWe study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable, removing the loop-carried data dependences induced by termination conditions. We propose an automatic compilation approach to parallelize and optimize dynamic counted loops. Our approach relies on affine relations only, as implemented in state-of-the-art polyhedral libraries. Revisiting a state-of-the-art framework to parallelize arbitrary while loops, we introduce additional control dependences on data-dependent predicates. Our method goes beyond the state of the art in fully automating the process, specializing the code generation algorithm to the case of dynamic counted loops and avoiding the introduction of spurious loop-carried dependences. We conduct experiments on representative irregular computations, from dynamic programming, computer vision and finite element methods to sparse matrix linear algebra. We validate that the method is applicable to general affine transformations for locality optimization, vectorization and parallelization

Crossref

INRIA a CCSD electronic archive server

Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures

Author: Emre Süreyya
Rastello Fabrice
Sadayyapan Ponnuswamy
Sukumaran-Rajam Aravind
Publication venue: HAL CCSD
Publication date: 09/11/2020
Field of study

International audienceTiling is a key technique to reduce data movement in matrix computations. While tiling is well understood and widely used for dense matrix/tensor computations, effective tiling of sparse matrix computations remains a challenging problem. This paper proposes a novel method to efficiently summarize the impact of the sparsity structure of a matrix on achievable data reuse as a one-dimensional signature, which is then used to build an analytical cost model for tile size optimization for sparse matrix computations. The proposed model-driven approach to sparse tiling is evaluated on two key sparse matrix kernels: Sparse Matrix-Dense Matrix Multiplication (SpMM) and Sampled Dense-Dense Matrix Multiplication (SDDMM). Experimental results demonstrate that model-based tiled SpMM and SDDMM achieve high performance relative to the current state-of-the-art

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Doctor of Philosophy

Author: Venkat Anand
Publication venue: University of Utah
Publication date: 01/01/2016
Field of study

dissertationSparse matrix codes are found in numerous applications ranging from iterative numerical solvers to graph analytics. Achieving high performance on these codes has however been a significant challenge, mainly due to array access indirection, for example, of the form A[B[i]]. Indirect accesses make precise dependence analysis impossible at compile-time, and hence prevent many parallelizing and locality optimizing transformations from being applied. The expert user relies on manually written libraries to tailor the sparse code and data representations best suited to the target architecture from a general sparse matrix representation. However libraries have limited composability, address very specific optimization strategies, and have to be rewritten as new architectures emerge. In this dissertation, we explore the use of the inspector/executor methodology to accomplish the code and data transformations to tailor high performance sparse matrix representations. We devise and embed abstractions for such inspector/executor transformations within a compiler framework so that they can be composed with a rich set of existing polyhedral compiler transformations to derive complex transformation sequences for high performance. We demonstrate the automatic generation of inspector/executor code, which orchestrates code and data transformations to derive high performance representations for the Sparse Matrix Vector Multiply kernel in particular. We also show how the same transformations may be integrated into sparse matrix and graph applications such as Sparse Matrix Matrix Multiply and Stochastic Gradient Descent, respectively. The specific constraints of these applications, such as problem size and dependence structure, necessitate unique sparse matrix representations that can be realized using our transformations. Computations such as Gauss Seidel, with loop carried dependences at the outer most loop necessitate different strategies for high performance. Specifically, we organize the computation into level sets or wavefronts of irregular size, such that iterations of a wavefront may be scheduled in parallel but different wavefronts have to be synchronized. We demonstrate automatic code generation of high performance inspectors that do explicit dependence testing and level set construction at runtime, as well as high performance executors, which are the actual parallelized computations. For the above sparse matrix applications, we automatically generate inspector/executor code comparable in performance to manually tuned libraries

The University of Utah: J. Willard Marriott Digital Library

An approach for code generation in the Sparse Polyhedral Framework

Author: Ahmed
Al-Furaih
Alan LaMielle
Banerjee
Barbara Kreaseck
Bastoul
Bell
Benabderrahmane
Bik
Bondhugula
Brandes
Buluç
Carr
Catherine Olschanowsky
Chapman
Cierniak
Cohen
Cuthill
Das
Das
Demmel
Ding
Douglas
Feautrier
Feautrier
Feautrier
Fu
Goedecker
Han
Hiranandani
Im
Jeanne Ferrante
Kandemir
Kelly
Kelly
Kodukula
Koelbel
Larry Carter
Li
Lin
Liu
Liu
Lu
Maydan
Mellor-Crummey
Michelle Mills Strout
Mitchell
Mohiyuddin
Nishtala
Norrish
Osheim
Ponnusamy
Ponnusamy
Pugh
Pugh
Quilleré
Rafique
Rauchwerger
Rudy
Saltz
Saltz
Sarkar
Saule
Sharma
Singh
Strout
Strout
Strout
Strout
Strout
Strout
Strout
Strout
Strout
Taylor
Thies
Venkat
Venkat
Venkat
von Hanxleden
Vuduc
Williams
Williams
Wolf
Wolf
Wolf
Wu
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

Applications that manipulate sparse data structures contain memory reference patterns that are unknown at compile time due to indirect accesses such as A[B[i]]. To exploit parallelism and improve locality in such applications, prior work has developed a number of Run-Time Reordering Transformations (RTRTs). This paper presents the Sparse Polyhedral Framework (SPF) for specifying RTRTs and compositions thereof and algorithms for automatically generating efficient inspector and executor code to implement such transformations. Experimental results indicate that the performance of automatically generated inspectors and executors competes with the performance of hand-written ones when further optimization is done.We thank Jon Roelofs for his implementation of the IEGenCC tool, which converts C programs into the specification format IEGen expects as input. We thank Christopher Krieger, Andrew Stone, Tomofumi Yuki, and anonymous reviewers for their careful reading and suggestions. This work was sponsored by NSF CAREER Grant CCF-0746693, DOE Early Career Grant DE-SC3956, the CSCAPES Institute DOE Grant 7F-00323, and the CACHE project DOE Grant DE-SC04030.Available online 4 March 2016. 24 month embargo.This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

Crossref

The University of Arizona

Recommended from our members

An approach for code generation in the Sparse Polyhedral Framework

Author: Carter Larry
Ferrante Jeanne
Kreaseck Barbara
LaMielle Alan
Olschanowsky Catherine
Strout Michelle Mills
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

The University of Arizona