Search CORE

6 research outputs found

Automatically Harnessing Sparse Acceleration

Sparse linear algebra is central to many scientific programs, yet compilers fail to optimize it well. High-performance libraries are available, but adoption costs are significant. Moreover, libraries tie programs into vendor-specific software and hardware ecosystems, creating non-portable code. In this paper, we develop a new approach based on our specification Language for implementers of Linear Algebra Computations (LiLAC). Rather than requiring the application developer to (re)write every program for a given library, the burden is shifted to a one-off description by the library implementer. The LiLAC-enabled compiler uses this to insert appropriate library routines without source code changes. LiLAC provides automatic data marshaling, maintaining state between calls and minimizing data transfers. Appropriate places for library insertion are detected in compiler intermediate representation, independent of source languages. We evaluated on large-scale scientific applications written in FORTRAN; standard C/C++ and FORTRAN benchmarks; and C++ graph analytics kernels. Across heterogeneous platforms, applications and data sets we show speedups of 1.1

\times

to over 10

\times

without user intervention.Comment: Accepted to CC 202

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Recommended from our members

Automatic Sparse Computation Parallelization by Utilizing Domain-Specific Knowledge in Data Dependence Analysis

Author: Soltan Mohammadi Mahdi
Soltan Mohammadi Mahdi
Publication venue: The University of Arizona.
Publication date: 01/01/2020
Field of study

Sparse vectors, matrices, and tensors are commonly used to compress nonzero values of big data manipulated in data analytics, scientific simulations, and machine learning computations. As with general computations, parallelization of loops in sparse computations, codes manipulating sparse structures, is essential to efficiently utilize available parallel architectures. The sparse computations often exhibit partial parallelism in loops that are sequential in the corresponding dense computation due to sparsity of data dependencies coming from indirect memory access through index arrays, e.g. {\tt col} in

val[col[j]]

. Such dependencies can only be discovered at runtime when content of index arrays are available. Consequently, performance programmers typically use the inspector/executor strategy to take advantage of partial parallelism in sparse computation. There, programmers implement an inspector code that creates iteration dependency graph at runtime from which wavefronts of iterations are extracted and fed into a parallel version of the computation called an executor. The executor executes iteration waves sequentially to respect sparse dependencies while executing iterations inside each wavefront in parallel. To automate the generation of the inspector and executor code, compiler-based loop-carried data dependency analysis is needed. However, straightforward automatically generated inspectors typically have significantly higher overhead than hand written optimized ones. Consequently, the specific problem that I am addressing in this dissertation is how can we automate the strategies used by expert programmers to generate efficient runtime inspectors for parallelizing sparse computation. The overarching contribution of this dissertation is an approach for encoding index array properties for individual index arrays and relationships between index arrays as universally quantified constraints and using them in compiler-based data dependence analysis. The dependence analysis is then evaluated in the context of finding wavefront parallelism in sparse computations. More specifically, one contribution is an approach to automatically use index array properties to prove more data dependencies unsatisfiable, removing the need for inspecting them at runtime. Other contributions are methods to use the same properties to simplify compile-time-satisfiable dependences by finding equalities and subset relationships enabling generation of faster runtime inspectors. The last contribution includes compile-time methods for expanding opportunities for array privatization in sparse computations by defining an array as private if its contents start and end each iteration with the same value. Evaluation results show my approach is able to find seven fully parallel loops in seven sparse computations where previous compiler-based approach could not, and efficiently extract partial parallelism from outer most loops of five out of six sparse computations

The University of Arizona

Automating Wavefront Parallelization for Sparse Matrix Computations

Author: Barik Rajkishore
Hall Mary
Mohammadi Mahdi Soltan
Park Jongsoo
Rong Hongbo
Strout Michelle Mills
Venkat Anand
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This paper presents a compiler and runtime framework for parallelizing sparse matrix computations that have loop-carried dependences. Our approach automatically generates a runtime inspector to collect data dependence information and achieves wavefront parallelization of the computation, where iterations within a wavefront execute in parallel, and synchronization is required across wavefronts. A key contribution of this paper involves dependence simplification, which reduces the time and space overhead of the inspector. This is implemented within a polyhedral compiler framework, extended for sparse matrix codes. Results demonstrate the feasibility of using automatically-generated inspectors and executors to optimize ILU factorization and symmetric Gauss-Seidel relaxations, which are part of the Preconditioned Conjugate Gradient (PCG) computation. Our implementation achieves a median speedup of 2.97x on 12 cores over the reference sequential PCG implementation, significantly outperforms PCG parallelized using Intel's Math Kernel Library (MKL), and is within 6% of the median performance of manually-parallelized PCG.Scientific Discovery through Advanced Computing (SciDAC) program - U.S. Department of Energy Office of Advanced Scientific Computing Research [DE-SC0006947]; NSF [CNS-1302663, CCF-1564074]This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

Crossref

The University of Arizona

Sparse Matrix Code Dependence Analysis Simplification at Compile Time

Author: Cheshmi Kazem
Gopalakrishnan Ganesh
Hall Mary
Mehri Dehnavi Maryam
Mills Strout Michelle
Soltan Mohammadi Mahdi
Venkat Anand
Yuki Tomofumi
Publication venue: HAL CCSD
Publication date: 27/07/2018
Field of study

Analyzing array-based computations to determine data dependences is useful for many applications including automatic parallelization, race detection, computation and communication overlap, verification, and shape analysis. For sparse matrix codes, array data dependence analysis is made more difficult by the use of index arrays that make it possible to store only the nonzero entries of the matrix (e.g., in A[B[i]], B is an index array). Here, dependence analysis is often stymied by such indirect array accesses due to the values of the index array not being available at compile time. Consequently, many dependences cannot be proven unsatisfiable or determined until runtime. Nonetheless, index arrays in sparse matrix codes often have properties such as monotonicity of index array elements that can be exploited to reduce the amount of runtime analysis needed. In this paper, we contribute a formulation of array data dependence analysis that includes encoding index array properties as universally quantified constraints. This makes it possible to leverage existing SMT solvers to determine whether such dependences are unsatisfiable and significantly reduces the number of dependences that require runtime analysis in a set of eight sparse matrix kernels. Another contribution is an algorithm for simplifying the remaining satisfiable data dependences by discovering equalities and/or subset relationships. These simplifications are essential to make a runtime-inspection-based approach feasible

HAL-CentraleSupelec

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Sparse computation data dependence simplification for efficient compiler-generated inspectors

Author: Cheshmi Kazem
Davis Eddie
Dehnavi Maryam Mehri
Hall Mary
Mohammadi Mahdi Soltan
Nandy Payal
Olschanowsky Catherine
Strout Michelle Mills
Venkat Anand
Yuki Tomofumi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

International audienc

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Boise State University - ScholarWorks

HAL Descartes

Hal-Diderot

HAL-Rennes 1