This talk discusses the node-level performance of numerical algorithms for handling high-dimensional problems in a compressed tensor format.
It focusses on two problems in particular: (1) approximating large (dense) data (lossy compression) and (2) solving linear systems in the tensor-train / matrix-product states format.
For both problems, we optimize the required underlying linear algebra operations, respectively the mapping of the high-level algorithm to (potentially less accurate) lower-level operations. In particular, we suggest improvements for costly orthogonalization and truncation steps based on a high-performance implementation of a "Q-less" tall-skinny QR decomposition.
Further optimizations for solving linear systems include memory layout optimizations for faster tensor contractions and a simple generic preconditioner.
We show performance results on todays multi-core CPUs where we obtain a speedup of up ~50x over the reference implementation for the lossy compression, and up to ~5x for solving linear systems