CORE
🇺🇦
make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
performance test and analysis of alltoall collective communication on domestic hundred trillion times cluster system
Authors
张云泉
李玉成
饶立
Publication date
1 January 2010
Publisher
Abstract
随着高性能计算机的应用和发展,并行应用程序所使用的处理器数越来越多,进程间的通信量也不断增多,这对应用程序的性能有很大影响。在采用一种快速傅里叶变换HFFT对曙光5000A进行性能测试时发现,MPI集合通信函数MPI Alltoall的巨大通信开销是并行程序设计的瓶颈。为此,对现有主流Alltoall算法在曙光5000A和深腾7000上进行性能测试与分析,以期对未来的Alltoall算法的优化工作做出贡献。利用不同消息长度和不同进程数测试了Alltoall函数多种算法的性能,这些算法包括二维网格算法、三维网格算法、Bruck算法、原始算法、成对交换算法、递归倍增算法、环算法以及LAM/MPI中的简单算法等。实验结果表明:消息长度较小时,在曙光5000A上采用原始算法和Bruck算法的性能较好,而在深腾7000上用时较少的算法是简单算法和Bruck算法;对于长消息,曙光5000A上最优的算法是环算法,深腾7000上成对交换性能最优
Similar works
Full text
Available Versions
Institute Of Software, Chinese Academy Of Sciences
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:ir.iscas.ac.cn:311060/9852
Last time updated on 30/12/2017