Search CORE

17 research outputs found

MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

Author: Foster I.
Karonis N. T.
Toonen B.
Publication venue
Publication date: 01/01/2002
Field of study

Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues, we have developed MPICH-G2, a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer. This library extends the Argonne MPICH implementation of MPI to use services provided by the Globus Toolkit for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance-critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, both network topology and network quality-of-service mechanisms. We describe the MPICH-G2 design and implementation, present performance results, and review application experiences, including record-setting distributed simulations.Comment: 20 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids

Author: de Supinski B.
Foster I.
Gropp W.
Karonis N. T.
Lusk E.
Publication venue
Publication date: 01/01/2002
Field of study

The efficient implementation of collective communiction operations has received much attention. Initial efforts produced "optimal" trees based on network communication models that assumed equal point-to-point latencies between any two processes. This assumption is violated in most practical settings, however, particularly in heterogeneous systems such as clusters of SMPs and wide-area "computational Grids," with the result that collective operations perform suboptimally. In response, more recent work has focused on creating topology-aware trees for collective operations that minimize communication across slower channels (e.g., a wide-area network). While these efforts have significant communication benefits, they all limit their view of the network to only two layers. We present a strategy based upon a multilayer view of the network. By creating multilevel topology-aware trees we take advantage of communication cost differences at every level in the network. We used this strategy to implement topology-aware versions of several MPI collective operations in MPICH-G2, the Globus Toolkit[tm]-enabled version of the popular MPICH implementation of the MPI standard. Using information about topology provided by MPICH-G2, we construct these multilevel topology-aware trees automatically during execution. We present results demonstrating the advantages of our multilevel approach by comparing it to the default (topology-unaware) implementation provided by MPICH and a topology-aware two-layer implementation.Comment: 16 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

The quandry of benchmarking broadcasts

Author: Karonis N T
Supinski B R
Publication venue: Lawrence Livermore National Laboratory
Publication date: 05/02/1999
Field of study

A message passing library's implementation of broadcast communication can significantly affect the performance of applications built with that library. In order to choose between similar implementations or to evaluate available libraries, accurate measurements of broadcast performance are required. As we demonstrate, existing methods for measuring broadcast performance are either inaccurate or inadequate. Fortunately, we have designed an accurate method for measuring broadcast performance. Measuring broadcast performance is not simple. Simply sending one broadcast after another allows them to proceed through the network concurrently, thus resulting in accurate per broadcast timings. Existing methods either fail to eliminate this pipelining effect or eliminate it by introducing overheads that are as difficult to measure as the performance of the broadcast itself. Our method introduces a measurable overhead to eliminate the pipelining effect

UNT Digital Library

Results from a Prototype Proton-CT Head Scanner

Author: Bashkirov V. A.
Coutrakon G.
Giacometti V.
Johnson R. P.
Karbasi P.
Karonis N. T.
Ordoñez C. E.
Pankuch M.
Sadrozinski H. F. -W.
Schubert K. E.
Schulte R. W.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

We are exploring low-dose proton radiography and computed tomography (pCT) as techniques to improve the accuracy of proton treatment planning and to provide artifact-free images for verification and adaptive therapy at the time of treatment. Here we report on comprehensive beam test results with our prototype pCT head scanner. The detector system and data acquisition attain a sustained rate of more than a million protons individually measured per second, allowing a full CT scan to be completed in six minutes or less of beam time. In order to assess the performance of the scanner for proton radiography as well as computed tomography, we have performed numerous scans of phantoms at the Northwestern Medicine Chicago Proton Center including a custom phantom designed to assess the spatial resolution, a phantom to assess the measurement of relative stopping power, and a dosimetry phantom. Some images, performance, and dosimetry results from those phantom scans are presented together with a description of the instrument, the data acquisition system, and the calibration methods.Comment: Conference on the Application of Accelerators in Research and Industry, CAARI 2016, 30 October to 4 November 2016, Ft. Worth, TX, US

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Research Online

Recommended from our members

A grid-enabled MPI : message passing in heterogeneous distributed computing systems.

Author: Foster I.
Karonis N. T.
Publication venue: Argonne National Laboratory
Publication date: 30/11/2000
Field of study

Application development for high-performance distributed computing systems, or computational grids as they are sometimes called, requires grid-enabled tools that hide mundate aspects of the heterogeneous grid environment without compromising performance. As part of an investigation of these issues, they have developed MPICH-G, a grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers at different sites using the same commands that would be used on a parallel computer. This library extends the Argonne MPICH implementation of MPI to use services provided by the globus grid toolkit. In this paper, they describe the MPICH-G implementation and present preliminary performance results

UNT Digital Library

Recommended from our members

Accurately measuring MPI broadcasts in a computational grid

Author: de Supinski B R
T Karonis N
Publication venue: Lawrence Livermore National Laboratory
Publication date: 06/05/1999
Field of study

An MPI library's implementation of broadcast communication can significantly affect the performance of applications built with that library. In order to choose between similar implementations or to evaluate available libraries, accurate measurements of broadcast performance are required. As we demonstrate, existing methods for measuring broadcast performance are either inaccurate or inadequate. Fortunately, we have designed an accurate method for measuring broadcast performance, even in a challenging grid environment. Measuring broadcast performance is not easy. Simply sending one broadcast after another allows them to proceed through the network concurrently, thus resulting in inaccurate per broadcast timings. Existing methods either fail to eliminate this pipelining effect or eliminate it by introducing overheads that are as difficult to measure as the performance of the broadcast itself. This problem becomes even more challenging in grid environments. Latencies a long different links can vary significantly. Thus, an algorithm's performance is difficult to predict from it's communication pattern. Even when accurate pre-diction is possible, the pattern is often unknown. Our method introduces a measurable overhead to eliminate the pipelining effect, regardless of variations in link latencies. choose between different available implementations. Also, accurate and complete measurements could guide use of a given implementation to improve application performance. These choices will become even more important as grid-enabled MPI libraries [6, 7] become more common since bad choices are likely to cost significantly more in grid environments. In short, the distributed processing community needs accurate, succinct and complete measurements of collective communications performance. Since successive collective communications can often proceed concurrently, accurately measuring them is difficult. Some benchmarks use knowledge of the communication algorithm to predict the timing of events and, thus, eliminate concurrency between the collective communications that they measure. However, accurate event timing predictions are often impossible since network delays and local processing overheads are stochastic. Further, reasonable predictions are not possible if source code of the implementation is unavailable to the benchmark. We focus on measuring the performance of broadcast communication

UNT Digital Library

HyMPI – A MPI Implementation for Heterogeneous High Performance Systems

Author: A. Geist
G. Fagg
M. Snir
N. Karonis
T. Imamura
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref

A Grid-Aware Branch, Cut and Price Implementation

Author: J. Beasley
J. Novotny
K. Aida
N. Karonis
T. Ralphs
T.K. Ralphs
Y. Shinano
Publication venue: country:DEU
Publication date: 01/01/2005
Field of study

This paper presents a grid-enabled system for solving large-scale optimization problems. The system has been developed using Globus and MPICH-G2 grid technologies, and consists of two BCP solvers and of an interface portal. After a brief introduction to Branch, Cut and Price optimization algorithms, the system architecture, the solvers and the portal user interface are described. Finally, some of the tests performed and the obtained results are illustrated

Crossref

Archivio della Ricerca - Università di Salerno

Recommended from our members

Exploiting hierarchy in parallel computer networks to optimize collective operations performance

Author: Bresnahan J.
de Supinski B. R.
Foster I.
Gropp W.
Karonis N. T.
Lusk E.
Publication venue: Argonne National Laboratory
Publication date: 04/02/2000
Field of study

The efficient implementation of collective communication operations has received much attention. Initial efforts modeled network communication and produced optimal trees based on those models. However, the models used by these initial efforts assumed equal point-to-point latencies between any two processes. This assumption is violated in heterogeneous systems such as clusters of SMPs and wide-area computational grids, and as a result, collective operations that utilize the trees generated by these models perform suboptimally. In response, more recent work has focused on creating topology-aware trees for collective operations that minimize communication across slower channels (e.g., a wide-area network). While these efforts have significant communication benefits, they all limit their view of the network to only two layers. The authors present a strategy based upon a multilayer view of the network. By creating multilevel topology trees they take advantage of communication cost differences at every level in the network. They used this strategy to implement topology-aware versions of several MPI collective operations in MPICH-G, the Globus-enabled version of the popular MPICH implementation of the MPI standard. Using information about topology discovered by Globus, they construct these topology-aware trees automatically during execution, thus freeing the MPI application programmer from having to write special files or functions to describe the topology to the MPICH library. They present results demonstrating the advantages of their multilevel approach by comparing it to the default (topology-unaware) implementation provided by MPICH and a topology-aware two-layer implementation

UNT Digital Library

P2P-MPI: A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs on Grids

Author: Choopan Rattanapoka
D.H. Bailey
E. Gabriel
F. Schneider
G. Fedak
N. Budhiraja
N.T. Karonis
R. Thakur
S. Louca
Stéphane Genaud
T. Kielmann
W. Chase
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref