Performance of numerical algorithms for low-rank tensor operations in tensor-train / matrix-product-states format

Abstract

This talk discusses the node-level performance of numerical algorithms for handling high-dimensional problems in a compressed tensor format. It focusses on two problems in particular: (1) approximating large (dense) data (lossy compression) and (2) solving linear systems in the tensor-train  / matrix-product states format. For both problems, we optimize the required underlying linear algebra operations, respectively the mapping of the high-level algorithm to (potentially less accurate) lower-level operations. In particular, we suggest improvements for costly orthogonalization and truncation steps based on a high-performance implementation of a "Q-less" tall-skinny QR decomposition. Further optimizations for solving linear systems include memory layout optimizations for faster tensor contractions and a simple generic preconditioner. We show performance results on todays multi-core CPUs where we obtain a speedup of up ~50x over the reference implementation for the lossy compression, and up to ~5x for solving linear systems

    Similar works