10 research outputs found
Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP
With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as cooperative groups, the performance price of compiling HIP code for AMD architectures, and the design of a library providing native backends for NVIDIA and AMD GPUs while minimizing code duplication by using a shared code base
Preparing Ginkgo for AMD GPUs -- A Testimonial on Porting CUDA Code to HIP
With AMD reinforcing their ambition in the scientific high performance
computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra
package to feature a HIP backend for AMD GPUs. In this paper, we report and
discuss the porting effort from CUDA, the extension of the HIP framework to add
missing features such as cooperative groups, the performance price of compiling
HIP code for AMD architectures, and the design of a library providing native
backends for NVIDIA and AMD GPUs while minimizing code duplication by using a
shared code base.Comment: Preprint submitted to HeteroPa
Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs
We contribute to the optimization of the sparse matrix-vector product on graphics processing units by introducing a variant of the coordinate sparse matrix layout that compresses the integer representation of the matrix indices. In addition, we employ a look-ahead table to avoid the storage of repeated numerical values in the sparse matrix, yielding a more compact data representation that is easier to maintain in the cache. Our evaluation on the two most recent generations of NVIDIA GPUs, the V100 and the A100 architectures, shows considerable performance improvements over the kernels for the sparse matrix-vector product in cuSPARSE (CUDA 11.0.167).This work was partially sponsored by the EU H2020 project 732631 OPRECOMP and project TIN2017-82972-R of the Spanish MINECO. Hartwig Anzt and Yuhsiang M. Tsai were supported by the “Impuls und Vernetzungsfond” of the Helmholtz Association under grant VH-NG-1241 and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The authors would like to thank the Steinbuch Centre for Computing (SCC) of the Karlsruhe Institute of Technology for providing access to an NVIDIA A100 GPU
Load-balancing Sparse Matrix Vector Product Kernels on GPUs
Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (SpMV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL- and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against SpMV kernels provided by NVIDIA's cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format
Recommended from our members
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Gliomas are the most common primary brain malignancies, with different
degrees of aggressiveness, variable prognosis and various heterogeneous
histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic
core, active and non-enhancing core. This intrinsic heterogeneity is also
portrayed in their radio-phenotype, as their sub-regions are depicted by
varying intensity profiles disseminated across multi-parametric magnetic
resonance imaging (mpMRI) scans, reflecting varying biological properties.
Their heterogeneous shape, extent, and location are some of the factors that
make these tumors difficult to resect, and in some cases inoperable. The amount
of resected tumor is a factor also considered in longitudinal scans, when
evaluating the apparent tumor for potential diagnosis of progression.
Furthermore, there is mounting evidence that accurate segmentation of the
various tumor sub-regions can offer the basis for quantitative image analysis
towards prediction of patient overall survival. This study assesses the
state-of-the-art machine learning (ML) methods used for brain tumor image
analysis in mpMRI scans, during the last seven instances of the International
Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we
focus on i) evaluating segmentations of the various glioma sub-regions in
pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue
of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO
criteria, and iii) predicting the overall survival from pre-operative mpMRI
scans of patients that underwent gross total resection. Finally, we investigate
the challenge of identifying the best ML algorithms for each of these tasks,
considering that apart from being diverse on each instance of the challenge,
the multi-institutional mpMRI BraTS dataset has also been a continuously
evolving/growing dataset
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Gliomas are the most common primary brain malignancies, with different
degrees of aggressiveness, variable prognosis and various heterogeneous
histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic
core, active and non-enhancing core. This intrinsic heterogeneity is also
portrayed in their radio-phenotype, as their sub-regions are depicted by
varying intensity profiles disseminated across multi-parametric magnetic
resonance imaging (mpMRI) scans, reflecting varying biological properties.
Their heterogeneous shape, extent, and location are some of the factors that
make these tumors difficult to resect, and in some cases inoperable. The amount
of resected tumor is a factor also considered in longitudinal scans, when
evaluating the apparent tumor for potential diagnosis of progression.
Furthermore, there is mounting evidence that accurate segmentation of the
various tumor sub-regions can offer the basis for quantitative image analysis
towards prediction of patient overall survival. This study assesses the
state-of-the-art machine learning (ML) methods used for brain tumor image
analysis in mpMRI scans, during the last seven instances of the International
Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we
focus on i) evaluating segmentations of the various glioma sub-regions in
pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue
of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO
criteria, and iii) predicting the overall survival from pre-operative mpMRI
scans of patients that underwent gross total resection. Finally, we investigate
the challenge of identifying the best ML algorithms for each of these tasks,
considering that apart from being diverse on each instance of the challenge,
the multi-institutional mpMRI BraTS dataset has also been a continuously
evolving/growing dataset