Search CORE

3 research outputs found

Quantifying the Performance Benefits of Partitioned Communication in MPI

Author: Gillis Thomas
Guo Yanfei
Raffenetti Ken
Thakur Rajeev
Zhou Hui
Publication venue
Publication date: 11/08/2023
Field of study

Partitioned communication was introduced in MPI 4.0 as a user-friendly interface to support pipelined communication patterns, particularly common in the context of MPI+threads. It provides the user with the ability to divide a global buffer into smaller independent chunks, called partitions, which can then be communicated independently. In this work we first model the performance gain that can be expected when using partitioned communication. Next, we describe the improvements we made to \mpich{} to enable those gains and provide a high-quality implementation of MPI partitioned communication. We then evaluate partitioned communication in various common use cases and assess the performance in comparison with other MPI point-to-point and one-sided approaches. Specifically, we first investigate two scenarios commonly encountered for small partition sizes in a multithreaded environment: thread contention and overhead of using many partitions. We propose two solutions to alleviate the measured penalty and demonstrate their use. We then focus on large messages and the gain obtained when exploiting the delay resulting from computations or load imbalance. We conclude with our perspectives on the benefits of partitioned communication and the various results obtained

arXiv.org e-Print Archive

C-Coll: Introducing Error-bounded Lossy Compression into MPI Collectives

Author: Cappello Franck
Chen Zizhong
Di Sheng
Guo Yanfei
Huang Jiajun
Liu Jinyang
Raffenetti Ken
Thakur Rajeev
Yu Xiaodong
Zhai Yujia
Zhao Kai
Zhou Hui
Publication venue
Publication date: 07/04/2023
Field of study

With the ever-increasing computing power of supercomputers and the growing scale of scientific applications, the efficiency of MPI collective communications turns out to be a critical bottleneck in large-scale distributed and parallel processing. Large message size in MPI collectives is a particularly big concern because it may significantly delay the overall parallel performance. To address this issue, prior research simply applies the off-the-shelf fix-rate lossy compressors in the MPI collectives, leading to suboptimal performance, limited generalizability, and unbounded errors. In this paper, we propose a novel solution, called C-Coll, which leverages error-bounded lossy compression to significantly reduce the message size, resulting in a substantial reduction in communication cost. The key contributions are three-fold. (1) We develop two general, optimized lossy-compression-based frameworks for both types of MPI collectives (collective data movement as well as collective computation), based on their particular characteristics. Our framework not only reduces communication cost but also preserves data accuracy. (2) We customize an optimized version based on SZx, an ultra-fast error-bounded lossy compressor, which can meet the specific needs of collective communication. (3) We integrate C-Coll into multiple collectives, such as MPI_Allreduce, MPI_Scatter, and MPI_Bcast, and perform a comprehensive evaluation based on real-world scientific datasets. Experiments show that our solution outperforms the original MPI collectives as well as multiple baselines and related efforts by 3.5-9.7X.Comment: 12 pages, 15 figures, 5 tables, submitted to SC '2

arXiv.org e-Print Archive

Why is MPI so slow?

Author: Amer Abdelhalim
Archer Charles
Balaji Pavan
Bland Wesley
Blocksome Michael
Chuvelev Michael
Coffman Paul
Durnov Dmitry
Fischer Paul
Fujita Hajime
Guo Yanfei
Hatanaka Masayuki
Janjusic Tomislav
Jose Jithin
Langer Akhil
Min Misun
Oblomov Sergey
Oden Lena
Otten Matt
Raffenetti Ken
Rathnayake Thilina
Sannikov Alexander
Seo Sangmin
Si Min
Sur Sayantan
Takagi Masamichi
Zheng Gengbin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Juelich Shared Electronic Resources