45 research outputs found
MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface
Application development for distributed computing "Grids" can benefit from
tools that variously hide or enable application-level management of critical
aspects of the heterogeneous environment. As part of an investigation of these
issues, we have developed MPICH-G2, a Grid-enabled implementation of the
Message Passing Interface (MPI) that allows a user to run MPI programs across
multiple computers, at the same or different sites, using the same commands
that would be used on a parallel computer. This library extends the Argonne
MPICH implementation of MPI to use services provided by the Globus Toolkit for
authentication, authorization, resource allocation, executable staging, and
I/O, as well as for process creation, monitoring, and control. Various
performance-critical operations, including startup and collective operations,
are configured to exploit network topology information. The library also
exploits MPI constructs for performance management; for example, the MPI
communicator construct is used for application-level discovery of, and
adaptation to, both network topology and network quality-of-service mechanisms.
We describe the MPICH-G2 design and implementation, present performance
results, and review application experiences, including record-setting
distributed simulations.Comment: 20 pages, 8 figure
A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids
The efficient implementation of collective communiction operations has
received much attention. Initial efforts produced "optimal" trees based on
network communication models that assumed equal point-to-point latencies
between any two processes. This assumption is violated in most practical
settings, however, particularly in heterogeneous systems such as clusters of
SMPs and wide-area "computational Grids," with the result that collective
operations perform suboptimally. In response, more recent work has focused on
creating topology-aware trees for collective operations that minimize
communication across slower channels (e.g., a wide-area network). While these
efforts have significant communication benefits, they all limit their view of
the network to only two layers. We present a strategy based upon a multilayer
view of the network. By creating multilevel topology-aware trees we take
advantage of communication cost differences at every level in the network. We
used this strategy to implement topology-aware versions of several MPI
collective operations in MPICH-G2, the Globus Toolkit[tm]-enabled version of
the popular MPICH implementation of the MPI standard. Using information about
topology provided by MPICH-G2, we construct these multilevel topology-aware
trees automatically during execution. We present results demonstrating the
advantages of our multilevel approach by comparing it to the default
(topology-unaware) implementation provided by MPICH and a topology-aware
two-layer implementation.Comment: 16 pages, 8 figure
Identifying Logical Homogeneous Clusters for Efficient Wide-area Communications
Recently, many works focus on the implementation of collective communication
operations adapted to wide area computational systems, like computational Grids
or global-computing. Due to the inherently heterogeneity of such environments,
most works separate "clusters" in different hierarchy levels. to better model
the communication. However, in our opinion, such works do not give enough
attention to the delimitation of such clusters, as they normally use the
locality or the IP subnet from the machines to delimit a cluster without
verifying the "homogeneity" of such clusters. In this paper, we describe a
strategy to gather network information from different local-area networks and
to construct "logical homogeneous clusters", better suited to the performance
modelling.Comment: http://www.springerlink.com/index/TTJJL61R1EXDLCM
Recommended from our members
The quandry of benchmarking broadcasts
A message passing library's implementation of broadcast communication can significantly affect the performance of applications built with that library. In order to choose between similar implementations or to evaluate available libraries, accurate measurements of broadcast performance are required. As we demonstrate, existing methods for measuring broadcast performance are either inaccurate or inadequate. Fortunately, we have designed an accurate method for measuring broadcast performance. Measuring broadcast performance is not simple. Simply sending one broadcast after another allows them to proceed through the network concurrently, thus resulting in accurate per broadcast timings. Existing methods either fail to eliminate this pipelining effect or eliminate it by introducing overheads that are as difficult to measure as the performance of the broadcast itself. Our method introduces a measurable overhead to eliminate the pipelining effect
Results from a Prototype Proton-CT Head Scanner
We are exploring low-dose proton radiography and computed tomography (pCT) as
techniques to improve the accuracy of proton treatment planning and to provide
artifact-free images for verification and adaptive therapy at the time of
treatment. Here we report on comprehensive beam test results with our prototype
pCT head scanner. The detector system and data acquisition attain a sustained
rate of more than a million protons individually measured per second, allowing
a full CT scan to be completed in six minutes or less of beam time. In order to
assess the performance of the scanner for proton radiography as well as
computed tomography, we have performed numerous scans of phantoms at the
Northwestern Medicine Chicago Proton Center including a custom phantom designed
to assess the spatial resolution, a phantom to assess the measurement of
relative stopping power, and a dosimetry phantom. Some images, performance, and
dosimetry results from those phantom scans are presented together with a
description of the instrument, the data acquisition system, and the calibration
methods.Comment: Conference on the Application of Accelerators in Research and
Industry, CAARI 2016, 30 October to 4 November 2016, Ft. Worth, TX, US
Recommended from our members
Technologies and tools for high-performance distributed computing. Final report
In this project we studied the practical use of the MPI message-passing interface in advanced distributed computing environments. We built on the existing software infrastructure provided by the Globus Toolkit{trademark}, the MPICH portable implementation of MPI, and the MPICH-G integration of MPICH with Globus. As a result of this project we have replaced MPICH-G with its successor MPICH-G2, which is also an integration of MPICH with Globus. MPICH-G2 delivers significant improvements in message passing performance when compared to its predecessor MPICH-G and was based on superior software design principles resulting in a software base that was much easier to make the functional extensions and improvements we did. Using Globus services we replaced the default implementation of MPI's collective operations in MPICH-G2 with more efficient multilevel topology-aware collective operations which, in turn, led to the development of a new timing methodology for broadcasts [8]. MPICH-G2 was extended to include client/server functionality from the MPI-2 standard [23] to facilitate remote visualization applications and, through the use of MPI idioms, MPICH-G2 provided application-level control of quality-of-service parameters as well as application-level discovery of underlying Grid-topology information. Finally, MPICH-G2 was successfully used in a number of applications including an award-winning record-setting computation in numerical relativity. In the sections that follow we describe in detail the accomplishments of this project, we present experimental results quantifying the performance improvements, and conclude with a discussion of our applications experiences. This project resulted in a significant increase in the utility of MPICH-G2