6 research outputs found
Distributed Graph Neural Network Training with Periodic Stale Representation Synchronization
Despite the recent success of Graph Neural Networks, it remains challenging
to train a GNN on large graphs with millions of nodes and billions of edges,
which are prevalent in many graph-based applications. Traditional
sampling-based methods accelerate GNN training by dropping edges and nodes,
which impairs the graph integrity and model performance. Differently,
distributed GNN algorithms accelerate GNN training by utilizing multiple
computing devices and can be classified into two types: "partition-based"
methods enjoy low communication costs but suffer from information loss due to
dropped edges, while "propagation-based" methods avoid information loss but
suffer from prohibitive communication overhead caused by the neighbor
explosion. To jointly address these problems, this paper proposes DIGEST
(DIstributed Graph reprEsentation SynchronizaTion), a novel distributed GNN
training framework that synergizes the complementary strength of both
categories of existing methods. We propose to allow each device to utilize the
stale representations of its neighbors in other subgraphs during subgraph
parallel training. This way, our method preserves global graph information from
neighbors to avoid information loss and reduce communication costs. Our
convergence analysis demonstrates that DIGEST enjoys a state-of-the-art
convergence rate. Extensive experimental evaluation on large, real-world graph
datasets shows that DIGEST achieves up to 21.82 speedups without compromising
performance compared to state-of-the-art distributed GNN training frameworks.Comment: Preprint: 20 pages, 9 figure
Reducing communication in graph neural network training
Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1. 5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges
Recommended from our members
Reducing Communication in Graph Neural Network Training
Graph Neural Networks (GNNs) are powerful and flexible neural networks that
use the naturally sparse connectivity information of the data. GNNs represent
this connectivity as sparse matrices, which have lower arithmetic intensity and
thus higher communication costs compared to dense matrices, making GNNs harder
to scale to high concurrencies than convolutional or fully-connected neural
networks.
We introduce a family of parallel algorithms for training GNNs and show that
they can asymptotically reduce communication compared to previous parallel GNN
training methods. We implement these algorithms, which are based on 1D, 1.5D,
2D, and 3D sparse-dense matrix multiplication, using torch.distributed on
GPU-equipped clusters. Our algorithms optimize communication across the full
GNN training pipeline. We train GNNs on over a hundred GPUs on multiple
datasets, including a protein network with over a billion edges
Recommended from our members
Reducing communication in graph neural network training
Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1. 5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges