25 research outputs found
A Unified Optimization Approach for Sparse Tensor Operations on GPUs
Sparse tensors appear in many large-scale applications with multidimensional
and sparse data. While multidimensional sparse data often need to be processed
on manycore processors, attempts to develop highly-optimized GPU-based
implementations of sparse tensor operations are rare. The irregular computation
patterns and sparsity structures as well as the large memory footprints of
sparse tensor operations make such implementations challenging. We leverage the
fact that sparse tensor operations share similar computation patterns to
propose a unified tensor representation called F-COO. Combined with
GPU-specific optimizations, F-COO provides highly-optimized implementations of
sparse tensor computations on GPUs. The performance of the proposed unified
approach is demonstrated for tensor-based kernels such as the Sparse Matricized
Tensor- Times-Khatri-Rao Product (SpMTTKRP) and the Sparse Tensor- Times-Matrix
Multiply (SpTTM) and is used in tensor decomposition algorithms. Compared to
state-of-the-art work we improve the performance of SpTTM and SpMTTKRP up to
3.7 and 30.6 times respectively on NVIDIA Titan-X GPUs. We implement a
CANDECOMP/PARAFAC (CP) decomposition and achieve up to 14.9 times speedup using
the unified method over state-of-the-art libraries on NVIDIA Titan-X GPUs
Parallel Sparse Tensor Decomposition in Chapel
In big-data analytics, using tensor decomposition to extract patterns from
large, sparse multivariate data is a popular technique. Many challenges exist
for designing parallel, high performance tensor decomposition algorithms due to
irregular data accesses and the growing size of tensors that are processed.
There have been many efforts at implementing shared-memory algorithms for
tensor decomposition, most of which have focused on the traditional C/C++ with
OpenMP framework. However, Chapel is becoming an increasingly popular
programing language due to its expressiveness and simplicity for writing
scalable parallel programs. In this work, we port a state of the art C/OpenMP
parallel sparse tensor decomposition tool, SPLATT, to Chapel. We present a
performance study that investigates bottlenecks in our Chapel code and
discusses approaches for improving its performance. Also, we discuss features
in Chapel that would have been beneficial to our porting effort. We demonstrate
that our Chapel code is competitive with the C/OpenMP code for both runtime and
scalability, achieving 83%-96% performance of the original code and near linear
scalability up to 32 cores.Comment: 2018 IEEE International Parallel and Distributed Processing Symposium
Workshops (IPDPSW), 5th Annual Chapel Implementers and Users Workshop (CHIUW
2018
Algorithms for Large-Scale Sparse Tensor Factorization
University of Minnesota Ph.D. dissertation. April 2019. Major: Computer Science. Advisor: George Karypis. 1 computer file (PDF); xiv, 153 pages.Tensor factorization is a technique for analyzing data that features interactions of data along three or more axes, or modes. Many fields such as retail, health analytics, and cybersecurity utilize tensor factorization to gain useful insights and make better decisions. The tensors that arise in these domains are increasingly large, sparse, and high dimensional. Factoring these tensors is computationally expensive, if not infeasible. The ubiquity of multi-core processors and large-scale clusters motivates the development of scalable parallel algorithms to facilitate these computations. However, sparse tensor factorizations often achieve only a small fraction of potential performance due to challenges including data-dependent parallelism and memory accesses, high memory consumption, and frequent fine-grained synchronizations among compute cores. This thesis presents a collection of algorithms for factoring sparse tensors on modern parallel architectures. This work is focused on developing algorithms that are scalable while being memory- and operation-efficient. We address a number of challenges across various forms of tensor factorizations and emphasize results on large, real-world datasets