Search CORE

146 research outputs found

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Author: Chen Wenguang
Fan Dongrui
Lin Haiyang
Pan Shirui
Xie Yuan
Yan Mingyu
Ye Xiaochun
Publication venue
Publication date: 29/11/2023
Field of study

Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. At present, the volume of related research on distributed GNN training is exceptionally vast, accompanied by an extraordinarily rapid pace of publication. Moreover, the approaches reported in these studies exhibit significant divergence. This situation poses a considerable challenge for newcomers, hindering their ability to grasp a comprehensive understanding of the workflows, computational patterns, communication strategies, and optimization techniques employed in distributed GNN training. As a result, there is a pressing need for a survey to provide correct recognition, analysis, and comparisons in this field. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.Comment: To Appear in Proceedings of the IEE

arXiv.org e-Print Archive

Proteus: Simulating the Performance of Distributed DNN Training

Author: Duan Jiangfei
Li Xiuhong
Liang Yun
Lin Dahua
Xu Ping
Yan Shengen
Zhang Xingcheng
Publication venue
Publication date: 04/06/2023
Field of study

DNN models are becoming increasingly larger to achieve unprecedented accuracy, and the accompanying increased computation and memory requirements necessitate the employment of massive clusters and elaborate parallelization strategies to accelerate DNN training. In order to better optimize the performance and analyze the cost, it is indispensable to model the training throughput of distributed DNN training. However, complex parallelization strategies and the resulting complex runtime behaviors make it challenging to construct an accurate performance model. In this paper, we present Proteus, the first standalone simulator to model the performance of complex parallelization strategies through simulation execution. Proteus first models complex parallelization strategies with a unified representation named Strategy Tree. Then, it compiles the strategy tree into a distributed execution graph and simulates the complex runtime behaviors, comp-comm overlap and bandwidth sharing, with a Hierarchical Topo-Aware Executor (HTAE). We finally evaluate Proteus across a wide variety of DNNs on three hardware configurations. Experimental results show that Proteus achieves

3.0\%

average prediction error and preserves order for training throughput of various parallelization strategies. Compared to state-of-the-art approaches, Proteus reduces prediction error by up to

133.8\%

arXiv.org e-Print Archive

The Landscape of Exascale Research: A Data-Driven Literature Analysis

Author: Belloum A.S.Z.
Heldens S.
Hijma P.
Maassen J.
Van Nieuwpoort R.V.
Van Werkhoven B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2020
Field of study

International Migration, Integration and Social Cohesion online publications

The future of computing beyond Moore's Law.

Author: Shalf John
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

Ezid

eScholarship - University of California

Closing the Gap for Pseudo-Polynomial Strip Packing

Author: Jansen Klaus
Rau Malin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

Two-dimensional packing problems are a fundamental class of optimization problems and Strip Packing is one of the most natural and famous among them. Indeed it can be defined in just one sentence: Given a set of rectangular axis parallel items and a strip with bounded width and infinite height, the objective is to find a packing of the items into the strip minimizing the packing height. We speak of pseudo-polynomial Strip Packing if we consider algorithms with pseudo-polynomial running time with respect to the width of the strip. It is known that there is no pseudo-polynomial time algorithm for Strip Packing with a ratio better than 5/4 unless P = NP. The best algorithm so far has a ratio of 4/3 + epsilon. In this paper, we close the gap between inapproximability result and currently known algorithms by presenting an algorithm with approximation ratio 5/4 + epsilon. The algorithm relies on a new structural result which is the main accomplishment of this paper. It states that each optimal solution can be transformed with bounded loss in the objective such that it has one of a polynomial number of different forms thus making the problem tractable by standard techniques, i.e., dynamic programming. To show the conceptual strength of the approach, we extend our result to other problems as well, e.g., Strip Packing with 90 degree rotations and Contiguous Moldable Task Scheduling, and present algorithms with approximation ratio 5/4 + epsilon for these problems as well

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Towards efficient end-to-end encryption for container checkpointing systems

Author: Bruno Rodrigo
Cłapiński Michał
Reber Adrian
Stoyanov Radostin
Ueno Daiki
Vagin Andrei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/07/2024
Field of study

Container checkpointing has emerged as a new paradigm for task migration, preemptive scheduling and elastic scaling of microservices. However, as soon as a snapshot that contains raw memory is exposed through the network or shared storage, sensitive data such as keys and passwords may become compromised. Existing solutions rely on encryption to protect data included in snapshots but by doing so prevent important performance optimizations such as memory de-duplication and incremental checkpointing. To address these challenges, we design and implement CRIUsec, an efficient end-to-end encryption scheme for container checkpointing systems built on the open-source CRIU (Checkpoint/Restore In Userspace). Our preliminary evaluation shows that CRIUsec integrates seamlessly with popular container platforms (Docker, Podman, Kubernetes), and compared to existing solutions, achieves an average of 1.57× speedup for memory-intensive workloads, and can be up to 100× faster for compute-intensive workloads

Oxford University Research Archive