The design and implementation of a high performance communication network are critical factors in determining the performance and cost-effectiveness of a largescale computing system. The major issues center on the trade-off between the network cost and the impact of latency and bandwidth on application performance. One promising technique for extracting maximum application performance given limited network resources is based on overlapping computation with communication, which partially or entirely hides communication delays. While this approach is not new, there are few studies that quantify the potential benefit of such overlapping for large-scale production scientific codes. We address this with an empirical method combined with a network model to quantify the potential overlap in several codes and examine the possible performance benefit. Our results demonstrate, for the codes examined, that a high potential tolerance to network latency and bandwidth exists because of a high degree of potential overlap. Moreover, our results indicate that there is often no need to use fine-grained communication mechanisms to achieve this benefit, since the major source of potential overlap is found in independent work—computation on which pending messages does not depend. This allows for a potentially significant relaxation of network requirements without a consequent degradation of application performance
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.