Graphs are omnipresent and GNNs are a powerful family of neural networks for
learning over graphs. Despite their popularity, scaling GNNs either by
deepening or widening suffers from prevalent issues of unhealthy gradients,
over-smoothening, information squashing, which often lead to sub-standard
performance. In this work, we are interested in exploring a principled way to
scale GNNs capacity without deepening or widening, which can improve its
performance across multiple small and large graphs. Motivated by the recent
intriguing phenomenon of model soups, which suggest that fine-tuned weights of
multiple large-language pre-trained models can be merged to a better minima, we
argue to exploit the fundamentals of model soups to mitigate the aforementioned
issues of memory bottleneck and trainability during GNNs scaling. More
specifically, we propose not to deepen or widen current GNNs, but instead
present a data-centric perspective of model soups tailored for GNNs, i.e., to
build powerful GNNs. By dividing giant graph data, we build multiple
independently and parallelly trained weaker GNNs (soup ingredient) without any
intermediate communication, and combine their strength using a greedy
interpolation soup procedure to achieve state-of-the-art performance. Compared
to concurrent distributed GNN training works such as Jiong et. al. 2023, we
train each soup ingredient by sampling different subgraphs per epoch and their
respective sub-models are merged only after being fully trained (rather than
intermediately so). Moreover, we provide a wide variety of model soup
preparation techniques by leveraging state-of-the-art graph sampling and graph
partitioning approaches that can handle large graphs. Codes are available at:
\url{https://github.com/VITA-Group/graph_ladling}.Comment: Accepted in ICML 2023. Included comparison with a concurrent work
(Jiong et. al. 2023) which independently presents similar ideas, among other
SOTA distributed GNN training work