High-dimensional sparse data emerge in many critical application domains such
as cybersecurity, healthcare, anomaly detection, and trend analysis. To quickly
extract meaningful insights from massive volumes of these multi-dimensional
data, scientists employ unsupervised analysis tools based on tensor
decomposition (TD) methods. However, real-world sparse tensors exhibit highly
irregular shapes, data distributions, and sparsity, which pose significant
challenges for making efficient use of modern parallel architectures. This
study breaks the prevailing assumption that compressing sparse tensors into
coarse-grained structures (i.e., tensor slices or blocks) or along a particular
dimension/mode (i.e., mode-specific) is more efficient than keeping them in a
fine-grained, mode-agnostic form. Our novel sparse tensor representation,
Adaptive Linearized Tensor Order (ALTO), encodes tensors in a compact format
that can be easily streamed from memory and is amenable to both caching and
parallel execution. To demonstrate the efficacy of ALTO, we accelerate popular
TD methods that compute the Canonical Polyadic Decomposition (CPD) model across
a range of real-world sparse tensors. Additionally, we characterize the major
execution bottlenecks of TD methods on multiple generations of the latest Intel
Xeon Scalable processors, including Sapphire Rapids CPUs, and introduce dynamic
adaptation heuristics to automatically select the best algorithm based on the
sparse tensor characteristics. Across a diverse set of real-world data sets,
ALTO outperforms the state-of-the-art approaches, achieving more than an
order-of-magnitude speedup over the best mode-agnostic formats. Compared to the
best mode-specific formats, which require multiple tensor copies, ALTO achieves
more than 5.1x geometric mean speedup at a fraction (25%) of their storage.Comment: We extend the results of our previous ICS paper to significantly
improve the parallel performance of the Canonical Polyadic Alternating Least
Squares (CP-ALS) algorithm for normally distributed data and the Canonical
Polyadic Alternating Poisson Regression (CP-APR) algorithm for non-negative
count dat