Search CORE

73 research outputs found

Characterization of message-passing overhead on the AP3000 multicomputer

Author: Doallo Ramón
Touriño Juan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2001
Field of study

This is a post-peer-review, pre-copyedit version. The final authenticated version is available online at: http://dx.doi.org/10.1109/ICPP.2001.952077[Abstract] The performance of the communication primitives of parallel computers is critical for the overall system performance. The characterization of the communication overhead is very important to estimate the global performance of parallel applications and to detect possible bottlenecks. In this paper, we evaluate, model and compare the performance of the message-passing libraries provided by the Fujitsu AP3000 multicomputer: MPI/AP, PVM/AP and APlib. Our aim is to fairly characterize the communication primitives using general models and performance metrics.Ministerio de Ciencia y Tecnología; 1FD97-0118-C02

Repositorio da Universidade da Coruña

Engineering the performance of parallel applications

Author: MacDonald Neil Blair
Publication venue: The University of Edinburgh
Publication date: 01/01/1996
Field of study

Edinburgh Research Archive

Servet: A Benchmark Suite for Autotuning on Multicore Clusters

Author: Fraguela Basilio B.
González-Domínguez Jorge
López Taboada Guillermo
Martín María J.
Touriño Juan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/05/2010
Field of study

This is a post-peer-review, pre-copyedit version of an article published in 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). Proceedings. The final authenticated version is available online at: http://dx.doi.org/10.1109/IPDPS.2010.5470358.[Abstract] MapReduce is a powerful tool for processing large data sets used by many applications running in distributed environments. However, despite the increasing number of computationally intensive problems that require low-latency communications, the adoption of MapReduce in High Performance Computing (HPC) is still emerging. Here languages based on the Partitioned Global Address Space (PGAS) programming model have shown to be a good choice for implementing parallel applications, in order to take advantage of the increasing number of cores per node and the programmability benefits achieved by their global memory view, such as the transparent access to remote data. This paper presents the first PGAS-based MapReduce implementation that uses the Unified Parallel C (UPC) language, which (1) obtains programmability benefits in parallel programming, (2) offers advanced configuration options to define a customized load distribution for different codes, and (3) overcomes performance penalties and bottlenecks that have traditionally prevented the deployment of MapReduce applications in HPC. The performance evaluation of representative applications on shared and distributed memory environments assesses the scalability of the presented MapReduce framework, confirming its suitability.Xunta de Galicia; INCITE08PXIB105161PRMinisterio de Ciencia e Innovación; TIN2007-67537-C03-02Ministerio de Educación; FPU; AP2008-0157

Repositorio da Universidade da Coruña

Evaluation of technologies of parallel computers' communication networks for a real-time triggering application in a high-energy physics experiment at CERN

Author: Hörtnagl C
Publication venue: Union of Concerned Scientists
Publication date: 01/01/1997
Field of study

CERN Document Server

MESSAGE-PASSING INTERFACE (MPI)

Author: Anthony Skjellum
Ewing Lusk
MISSISSIPPI Mississippi State
William Gropp
Publication venue
Publication date: 12/02/2020
Field of study

CiteSeerX

Recommended from our members

High-performance data-parallel input/output

Author: Moore Jason Andrew
Publication venue: 'Oregon State University'
Publication date
Field of study

Existing parallel file systems are proving inadequate in two important arenas: programmability and performance. Both of these inadequacies can largely be traced to the fact that nearly all parallel file systems evolved from Unix and rely on a Unix-oriented, single-stream, block-at-a-time approach to file I/O. This one-size-fits-all approach to parallel file systems is inadequate for supporting applications running on distributed-memory parallel computers. This research provides a migration path away from the traditional approaches to parallel I/O at two levels. At the level seen by the programmer, we show how file operations can be closely integrated with the semantics of a parallel language. Principles for this integration are illustrated in their application to C*, a virtual-processor- oriented language. The result is that traditional C file operations with familiar semantics can be used in C* where the programmer works--at the virtual processor level. To facilitate high performance within this framework, machine-independent modes are used. Modes change the performance of file operations, not their semantics, so programmers need not use ambiguous operations found in many parallel file systems. An automatic mode detection technique is presented that saves the programmer from extra syntax and low-level file system details. This mode detection system ensures that the most commonly encountered file operations are performed using high-performance modes. While the high-performance modes allow fast collective movement of file data, they must include optimizations for redistribution of file data, a common operation in production scientific code. This need is addressed at the file system level, where we provide enhancements to Disk-Directed I/O for redistributing file data. Two enhancements are geared to speeding fine-grained redistributions. One uses a two-phase, or indirect, approach to redistributing data among compute nodes. The other relies on I/O nodes to guide the redistribution by building packets bound for compute nodes. We model the performance of these enhancements and determine the key parameters determining when each approach should be used. Finally, we introduce the notion of collective prefetching and identify its performance benefits and implementation tradeoffs

ScholarsArchive@OSU

Statistical approach to performance evaluation of parallel systems with reference to chemical engineering

Author: Skilling Neil
Publication venue: The University of Edinburgh
Publication date: 01/01/1995
Field of study

Edinburgh Research Archive