Optimizing Distributed Tensor Contractions using Node-Aware Processor
  Grids

Grüneis, Andreas; Irmler, Andreas; Kanakagiri, Raghavendra; Ohlmann, Sebastian T.; Solomonik, Edgar

Optimizing Distributed Tensor Contractions using Node-Aware Processor Grids

Authors: Andreas Grüneis
Andreas Irmler
Raghavendra Kanakagiri
Sebastian T. Ohlmann
Edgar Solomonik
Publication date: 17 July 2023
Publisher

Abstract

We propose an algorithm that aims at minimizing the inter-node communication volume for distributed and memory-efficient tensor contraction schemes on modern multi-core compute nodes. The key idea is to define processor grids that optimize intra-/inter-node communication volume in the employed contraction algorithms. We present an implementation of the proposed node-aware communication algorithm into the Cyclops Tensor Framework (CTF). We demonstrate that this implementation achieves a significantly improved performance for matrix-matrix-multiplication and tensor-contractions on up to several hundreds modern compute nodes compared to conventional implementations without using node-aware processor grids. Our implementation shows good performance when compared with existing state-of-the-art parallel matrix multiplication libraries (COSMA and ScaLAPACK). In addition to the discussion of the performance for matrix-matrix-multiplication, we also investigate the performance of our node-aware communication algorithm for tensor contractions as they occur in quantum chemical coupled-cluster methods. To this end we employ a modified version of CTF in combination with a coupled-cluster code (Cc4s). Our findings show that the node-aware communication algorithm is also able to improve the performance of coupled-cluster theory calculations for real-world problems running on tens to hundreds of compute nodes.Comment: 15 pages, 4 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2307.08829

Last time updated on 26/07/2023