289 research outputs found
An MPA-IO interface to HPSS
This paper describes an implementation of the proposed MPI-IO (Message Passing Interface - Input/Output) standard for parallel I/O. Our system uses third-party transfer to move data over an external network between the processors where it is used and the I/O devices where it resides. Data travels directly from source to destination, without the need for shuffling it among processors or funneling it through a central node. Our distributed server model lets multiple compute nodes share the burden of coordinating data transfers. The system is built on the High Performance Storage System (HPSS), and a prototype version runs on a Meiko CS-2 parallel computer
Communication overhead on the Intel Paragon, IBM SP2 and Meiko CS-2
Interprocessor communication overhead is a crucial measure of the power of parallel computing systems-its impact can severely limit the performance of parallel programs. This report presents measurements of communication overhead on three contemporary commercial multicomputer systems: the Intel Paragon, the IBM SP2 and the Meiko CS-2. In each case the time to communicate between processors is presented as a function of message length. The time for global synchronization and memory access is discussed. The performance of these machines in emulating hypercubes and executing random pairwise exchanges is also investigated. It is shown that the interprocessor communication time depends heavily on the specific communication pattern required. These observations contradict the commonly held belief that communication overhead on contemporary machines is independent of the placement of tasks on processors. The information presented in this report permits the evaluation of the efficiency of parallel algorithm implementations against standard baselines
Multiphase complete exchange on Paragon, SP2 and CS-2
The overhead of interprocessor communication is a major factor in limiting the performance of parallel computer systems. The complete exchange is the severest communication pattern in that it requires each processor to send a distinct message to every other processor. This pattern is at the heart of many important parallel applications. On hypercubes, multiphase complete exchange has been developed and shown to provide optimal performance over varying message sizes. Most commercial multicomputer systems do not have a hypercube interconnect. However, they use special purpose hardware and dedicated communication processors to achieve very high performance communication and can be made to emulate the hypercube quite well. Multiphase complete exchange has been implemented on three contemporary parallel architectures: the Intel Paragon, IBM SP2 and Meiko CS-2. The essential features of these machines are described and their basic interprocessor communication overheads are discussed. The performance of multiphase complete exchange is evaluated on each machine. It is shown that the theoretical ideas developed for hypercubes are also applicable in practice to these machines and that multiphase complete exchange can lead to major savings in execution time over traditional solutions
Recommended from our members
Parallel computing and quantum simulations/011
Our goal was to investigate the suitability of parallel supercomputer architectures for Quantum Monte Carlo (QMC). Because QMC allows one to study the properties of ions and electrons in a solid, it has important applications to condensed matter physics, chemistry, and materials science. research plan was to Our specific 1. Adapt quantum simulation codes which were highly optimized for vector supercomputers to run on the Intel Hypercube and Thinking Machines CM--5. 2. Identify architectural bottlenecks in communication, floating point computation, and node memory. Determine scalability with number of nodes. 3. Identify algorithmic changes required to take advantage of current and prospective architectures. We have made significant progress towards these goals. We explored implementations of the p4 parallel programming system and the Message Passing Interface (MPI) libraries to run ``world-line`` and ``determinant`` QMC and Molecular Dynamics simulations on both workstation clusters (HP, Spare, AIX, Linux) and massively parallel supercomputers (Intel iPSC1860, Meiko CS-2, BM SP-X, Intel Paragon). We addressed issues of the efficiency of parallelization as a function of distribution of the problem over the nodes and the length scale of the interactions between particles. Both choices influence he frequency of inter-node communication and the size of messages passed. We found that using the message-passing paradigm on an appropriate machine (e.g., the ntel iPSC/860) an essentially linear speedup could be obtained
- …