Search CORE

32 research outputs found

Evaluation of technologies of parallel computers' communication networks for a real-time triggering application in a high-energy physics experiment at CERN

Author: Hörtnagl C
Publication venue: Union of Concerned Scientists
Publication date: 01/01/1997
Field of study

LoGPC: Modeling Network Contention in Message-Passing Programs

Author: Csaba Andras Moritz
I. Frank
Matthew I. Frank
Moritz Matthew
Publication venue
Publication date
Field of study

In many real applications, for example those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP [9] and LogGP [4] models to account for the impact of network contention and network interface DMA behavior on the performance of message-passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages [11, 18] on the MIT Alewife multiprocessor. Our analysis shows that network contention accounts for up to 50% of the total execution time. In addition, we show that the impact of communication locality on the communication costs is at most a factor of two on Alewife. Finally, we use the model to identify tradeoffs between synchronous and asynchronous message passing styles. 1 Introduction Users of parallel machines need good performa..

CiteSeerX

Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration

Author: Carretero Jesús
Marinescu Maria-Cristina
Martín Gonzalo
Singh David E.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The work in this paper focuses on providing malleability to MPI applications by using a novel performance-aware dynamic reconfiguration technique. This paper describes the design and implementation of Flex-MPI, an MPI library extension which can automatically monitor and predict the performance of applications, balance and redistribute the workload, and reconfigure the application at runtime by changing the number of processes. Unlike existent approaches, our reconfiguring policy is guided by user-defined performance criteria. We focus on iterative SPMD programs, a class of applications with critical mass within the scientific community. Extensive experiments show that Flex-MPI can improve the performance, parallel efficiency, and cost-efficiency of MPI programs with a minimal effort from the programmer.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under the project TIN2013- 41350-P, Scalable Data Management Techniques for High-End Computing Systems, and EU under the COST Program Action IC1305, Network for Sustainable Ultrascale Computing (NESUS)Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

High-performance data-parallel input/output

Author: Moore Jason Andrew
Publication venue: 'Oregon State University'
Publication date
Field of study

Existing parallel file systems are proving inadequate in two important arenas: programmability and performance. Both of these inadequacies can largely be traced to the fact that nearly all parallel file systems evolved from Unix and rely on a Unix-oriented, single-stream, block-at-a-time approach to file I/O. This one-size-fits-all approach to parallel file systems is inadequate for supporting applications running on distributed-memory parallel computers. This research provides a migration path away from the traditional approaches to parallel I/O at two levels. At the level seen by the programmer, we show how file operations can be closely integrated with the semantics of a parallel language. Principles for this integration are illustrated in their application to C*, a virtual-processor- oriented language. The result is that traditional C file operations with familiar semantics can be used in C* where the programmer works--at the virtual processor level. To facilitate high performance within this framework, machine-independent modes are used. Modes change the performance of file operations, not their semantics, so programmers need not use ambiguous operations found in many parallel file systems. An automatic mode detection technique is presented that saves the programmer from extra syntax and low-level file system details. This mode detection system ensures that the most commonly encountered file operations are performed using high-performance modes. While the high-performance modes allow fast collective movement of file data, they must include optimizations for redistribution of file data, a common operation in production scientific code. This need is addressed at the file system level, where we provide enhancements to Disk-Directed I/O for redistributing file data. Two enhancements are geared to speeding fine-grained redistributions. One uses a two-phase, or indirect, approach to redistributing data among compute nodes. The other relies on I/O nodes to guide the redistribution by building packets bound for compute nodes. We model the performance of these enhancements and determine the key parameters determining when each approach should be used. Finally, we introduce the notion of collective prefetching and identify its performance benefits and implementation tradeoffs

ScholarsArchive@OSU

Recommended from our members

Towards architecture-adaptable parallel programming

Author: Kumaran Santhosh
Publication venue: 'Oregon State University'
Publication date
Field of study

There is a software gap in parallel processing. The short lifespan and small installation base of parallel architectures have made it economically infeasible to develop platform-specific parallel programming environments that deliver performance and programmability. One obvious solution is to build architecture-independent programming environments. But the architecture independence usually comes at the expense of performance, since the most efficient parallel algorithm for solving a problem often depends on the target platform. Thus, unless a parallel programming system has the ability to adapt the algorithm to the architecture, it will not be effectively machine-independent. This research develops a new methodology for architecture-adaptable parallel programming. The methodology is built on three key ideas: (1) the use of a database of parameterized algorithmic templates to represent computable functions; (2) frame-based representation of processing environments; and (3) the use of an analytical performance prediction tool for automatic algorithm design. This methodology pursues a problem-oriented approach to parallel processing as opposed to the traditional algorithm-oriented approach. This enables the development of software environments with a high level of abstraction. The users state the problem to be solved using a high-level notation; they are freed from the esoteric tasks of parallel algorithm design and implementation. This methodology has been validated in the format of a prototype of a system capable of automatically generating an efficient parallel program when presented with a well-defined problem and the description of a target platform. The use of object technology has made the system easily extensible. The templates are designed using a parallel adaptation of the well-known divide-and-conquer paradigm. The prototype system has been used to solve several numerical problems efficiently on a wide spectrum of architectures. The target platforms include multicomputers (Thinking Machines CM-5 and Meiko CS-2), networks of workstations (IBM RS/6000s connected by FDDI), multiprocessors (Sequent Symmetry, SGI Power Challenge, and Sun SPARCServer), and a hierarchical system consisting of a cluster of multiprocessors on Myrinet

ScholarsArchive@OSU