185 research outputs found
MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface
Application development for distributed computing "Grids" can benefit from
tools that variously hide or enable application-level management of critical
aspects of the heterogeneous environment. As part of an investigation of these
issues, we have developed MPICH-G2, a Grid-enabled implementation of the
Message Passing Interface (MPI) that allows a user to run MPI programs across
multiple computers, at the same or different sites, using the same commands
that would be used on a parallel computer. This library extends the Argonne
MPICH implementation of MPI to use services provided by the Globus Toolkit for
authentication, authorization, resource allocation, executable staging, and
I/O, as well as for process creation, monitoring, and control. Various
performance-critical operations, including startup and collective operations,
are configured to exploit network topology information. The library also
exploits MPI constructs for performance management; for example, the MPI
communicator construct is used for application-level discovery of, and
adaptation to, both network topology and network quality-of-service mechanisms.
We describe the MPICH-G2 design and implementation, present performance
results, and review application experiences, including record-setting
distributed simulations.Comment: 20 pages, 8 figure
A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids
The efficient implementation of collective communiction operations has
received much attention. Initial efforts produced "optimal" trees based on
network communication models that assumed equal point-to-point latencies
between any two processes. This assumption is violated in most practical
settings, however, particularly in heterogeneous systems such as clusters of
SMPs and wide-area "computational Grids," with the result that collective
operations perform suboptimally. In response, more recent work has focused on
creating topology-aware trees for collective operations that minimize
communication across slower channels (e.g., a wide-area network). While these
efforts have significant communication benefits, they all limit their view of
the network to only two layers. We present a strategy based upon a multilayer
view of the network. By creating multilevel topology-aware trees we take
advantage of communication cost differences at every level in the network. We
used this strategy to implement topology-aware versions of several MPI
collective operations in MPICH-G2, the Globus Toolkit[tm]-enabled version of
the popular MPICH implementation of the MPI standard. Using information about
topology provided by MPICH-G2, we construct these multilevel topology-aware
trees automatically during execution. We present results demonstrating the
advantages of our multilevel approach by comparing it to the default
(topology-unaware) implementation provided by MPICH and a topology-aware
two-layer implementation.Comment: 16 pages, 8 figure
A Real-time Image Reconstruction System for Particle Treatment Planning Using Proton Computed Tomography (pCT)
Proton computed tomography (pCT) is a novel medical imaging modality for
mapping the distribution of proton relative stopping power (RSP) in medical
objects of interest. Compared to conventional X-ray computed tomography, where
range uncertainty margins are around 3.5%, pCT has the potential to provide
more accurate measurements to within 1%. This improved efficiency will be
beneficial to proton-therapy planning and pre-treatment verification. A
prototype pCT imaging device has recently been developed capable of rapidly
acquiring low-dose proton radiographs of head-sized objects. We have also
developed an advanced, fast image reconstruction software based on distributed
computing that utilizes parallel processors and graphical processing units. The
combination of fast data acquisition and fast image reconstruction will enable
the availability of RSP images within minutes for use in clinical settings. The
performance of our image reconstruction software has been evaluated using data
collected by the prototype pCT scanner from several phantoms.Comment: Paper presented at Conference on the Application of Accelerators in
Research and Industry, CAARI 2016, 30 October to 4 November 2016, Ft. Worth,
TX, US
Identifying Logical Homogeneous Clusters for Efficient Wide-area Communications
Recently, many works focus on the implementation of collective communication
operations adapted to wide area computational systems, like computational Grids
or global-computing. Due to the inherently heterogeneity of such environments,
most works separate "clusters" in different hierarchy levels. to better model
the communication. However, in our opinion, such works do not give enough
attention to the delimitation of such clusters, as they normally use the
locality or the IP subnet from the machines to delimit a cluster without
verifying the "homogeneity" of such clusters. In this paper, we describe a
strategy to gather network information from different local-area networks and
to construct "logical homogeneous clusters", better suited to the performance
modelling.Comment: http://www.springerlink.com/index/TTJJL61R1EXDLCM
Recommended from our members
A methodology for string resolution
In this paper we present a methodology, not a tool. We present this methodology with the intent that it be adopted, on a case by case basis, by each of the existing tools in EPICS. In presenting this methodology, we describe each of its two components in detail and conclude with an example depicting how the methodology can be used across a pair of tools. The task of any control system is to provide access to the various components of the machine being controlled, for example, the Advanced Photon Source (APS). By access, we mean the ability to monitor the machine`s status (reading) as well as the ability to explicitly change its status (writing). The Experimental Physics and Industrial Control System (EPICS) is a set of tools, designed to act in concert, that allows one to construct a control system. EPICS provides the ability to construct a control system that allows reading and writing access to the machine. It does this through the notion of databases. Each of the components of the APS that is accessed by the control system is represented in EPICS by a set of named database records. Once this abstraction is made, from physical device to named database records, the process of monitoring and changing the state of that device becomes the simple process of reading and writing information from and to its associated named records
Recommended from our members
BURT: back up and restore tool
BURT is just one of the tools in the Experimental Physics Industrial Control System (EPICS). In this document we address the problem of backing up and restoring sets of values in databases whose values are continuously changing. In doing so, we present the Back Up and Restore Tool (BURT). In this presentation we provide a theoretical framework that defines the problem and lays the foundation for its solution. BURT is a tool designed and implemented with respect to that theoretical framework. It is not necessary for users of BURT to have an understanding of that framework. It was included in this document only for the purpose of completeness. BURT`s basic purpose is to back up sets of values so that they can be later restored. Each time a back up is requested, a new ASCII file is generated. Further, the data values are stored as ASCII strings and therefore not compressed. Both of these facts conspire against BURT as a candidate for an archiver. Users who need an archiver should use a different tool, the Archiver
Recommended from our members
EZCA primer
This document provides a quick introduction to EZCA, a library that was designed to provide an easy to use interface to Channel Access (CA). As such, this document is not a user`s manual, where a more detailed explanation of EZCA can be found. In short, this document is designed to get users to a state where they can be writing EZCA code as quickly as possible. It is not a document that answers all EZCA questions
Recommended from our members
A framework for back-up and restore under the Experimental Physics and Industrial Control System
EPICS is a system that allows one to design and implement a controls system. At its foundation, i.e., the level closest to the devices being controlled, are autonomous computers, each called an Input/Output Controller or IOC. In EPICS, devices controlled by an IOC are represented by software entities called process variables. All devices are monitored/controlled by reading/writing values from/to their associated process variables. Under this schema, distributing processing over a number of IOCs and representing devices with process variables, there are a variety of ways one can view or group the information in the control system. Two of the more common groupings are by IOC (location) and by devices (function). Simply stated, the authors require a system capable of restoring the state of the machine, in their case the Advanced Photon Source, to a known desired state from somewhere in the past. To that end, they propose a framework which describes a system that periodically records and preserves the values of key process variables so that later on, those values can be written to the machine in an attempt to restore it to that same state. One of the more powerful notions that must be preserved in any system that solves this problem is the independence between the specification of what is monitored and the specification of what is written. In other words, grouping process variables for monitoring must be kept independent of the number of different ways to group process variables (e.g., by IOC, by device, etc.) when they are written
Bridging the gap between cluster and grid computing
The Internet computing model with its ubiquitous networking and computing infrastructure is driving a new class of interoperable applications that benefit both from high computing power and multiple Internet connections. In this context, grids are promising computing platforms that allow to aggregate distributed resources such as workstations and clusters to solve large-scale problems. However, because most parallel programming tools were primarily developed for MPP and cluster computing, to exploit the new environment higher abstraction and cooperative interfaces are required. Rocmeμ is a platform originally designed to support the operation of multi-SAN clusters that integrates application modeling and resource allocation. In this paper we show how the underlying resource oriented computation model provides the necessary abstractions to accommodate the migration from cluster to multicluster grid enabled computing
Recommended from our members
The quandry of benchmarking broadcasts
A message passing library's implementation of broadcast communication can significantly affect the performance of applications built with that library. In order to choose between similar implementations or to evaluate available libraries, accurate measurements of broadcast performance are required. As we demonstrate, existing methods for measuring broadcast performance are either inaccurate or inadequate. Fortunately, we have designed an accurate method for measuring broadcast performance. Measuring broadcast performance is not simple. Simply sending one broadcast after another allows them to proceed through the network concurrently, thus resulting in accurate per broadcast timings. Existing methods either fail to eliminate this pipelining effect or eliminate it by introducing overheads that are as difficult to measure as the performance of the broadcast itself. Our method introduces a measurable overhead to eliminate the pipelining effect
- …