Search CORE

11,023 research outputs found

MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

Author: Foster I.
Karonis N. T.
Toonen B.
Publication venue
Publication date: 01/01/2002
Field of study

Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues, we have developed MPICH-G2, a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer. This library extends the Argonne MPICH implementation of MPI to use services provided by the Globus Toolkit for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance-critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, both network topology and network quality-of-service mechanisms. We describe the MPICH-G2 design and implementation, present performance results, and review application experiences, including record-setting distributed simulations.Comment: 20 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

A grid middleware for distributed Java computing with MPI binding and process migration supports

Author: Chen L
Lau FCM
Wang CL
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

"Grid" computing has emerged as an important new research field. With years of efforts, grid researchers have successfully developed grid technologies including security solutions, resource management protocols, information query protocols, and data management services. However, as the ultimate goal of grid computing is to design an infrastructure which supports dynamic, cross-organizational resource sharing, there is a need of solutions for efficient and transparent task re-scheduling in the grid. In this research, a new grid middleware is proposed, called G-JavaMPI. This middleware adds the parallel computing capability of Java to the grid with the support of a Grid-enabled message passing interface (MPI) for inter-process communication between Java processes executed at different grid points. A special feature of the proposed G-JavaMPI is the support of Java process migration with post-migration message redirection. With these supports, it is possible to migrate executing Java process from site to site for continuous computation, if some site is scheduled to be turned down for system reconfiguration. Moreover, the proposed G-JavaMPI middleware is very portable since it requires no modification of underlying OS, Java virtual machine, and MPI package. Preliminary performance tests have been conducted. The proposed mechanisms have shown good migration efficiency in a simulated grid environment.postprin

HKU Scholars Hub

CrossBroker: A Grid Metascheduler for Interactive and Parallel Jobs

Author: Cencerrado Andres
Fernández Enol
Heymann Elisa
Senar Miquel A.
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 27/01/2012
Field of study

Execution of parallel and interactive applications on a Grid environment is a challenging problem that requires the cooperation of several middleware tools and services. In this paper, we present our experiences in the development of CrossBroker, a job management service that provides transparent and reliable support for such types of applications. We outline the main components of CrossBroker and how they interact with other middleware services. We also describe specific features of the scheduler used to guarantee resource co-allocation for running MPI jobs remotely over multiple machines spread across several Grid sites or to start interactive applications as fast as possible. These features include a simple time-sharing mechanism that allows fast execution of interactive applications even under heavy occupancy of Grid resources

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Development of Grid e-Infrastructure in South-Eastern Europe

Author: A Balaž
Antun Balaž
B Novaković
Boro Jakimovski
C Ozturan
D Davidović
D Syrakov
Dušan Vudragović
Emanouil Atanassov
I Vidanović
I Vidanović
IE Stanković
Ioannis Liabotis
K Lagouvardos
Mihajlo Savić
MV Milovanović
Ognjen Prnjat
V Kotroni
Vladimir Slavnić
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/12/2011
Field of study

Over the period of 6 years and three phases, the SEE-GRID programme has established a strong regional human network in the area of distributed scientific computing and has set up a powerful regional Grid infrastructure. It attracted a number of user communities and applications from diverse fields from countries throughout the South-Eastern Europe. From the infrastructure point view, the first project phase has established a pilot Grid infrastructure with more than 20 resource centers in 11 countries. During the subsequent two phases of the project, the infrastructure has grown to currently 55 resource centers with more than 6600 CPUs and 750 TBs of disk storage, distributed in 16 participating countries. Inclusion of new resource centers to the existing infrastructure, as well as a support to new user communities, has demanded setup of regionally distributed core services, development of new monitoring and operational tools, and close collaboration of all partner institution in managing such a complex infrastructure. In this paper we give an overview of the development and current status of SEE-GRID regional infrastructure and describe its transition to the NGI-based Grid model in EGI, with the strong SEE regional collaboration.Comment: 22 pages, 12 figures, 4 table

arXiv.org e-Print Archive

Crossref

Recommended from our members

Leveraging legacy codes to distributed problem solving environments: A web service approach

Author: Huang Y
Li M
Rana O
Walker D
Ward R
Williams P
Publication venue: 'Elsevier BV'
Publication date: 01/01/2003
Field of study

This paper describes techniques used to leverage high performance legacy codes as CORBA components to a distributed problem solving environment. It first briefly introduces the software architecture adopted by the environment. Then it presents a CORBA oriented wrapper generator (COWG) which can be used to automatically wrap high performance legacy codes as CORBA components. Two legacy codes have been wrapped with COWG. One is an MPI-based molecular dynamic simulation (MDS) code, the other is a finite element based computational fluid dynamics (CFD) code for simulating incompressible Navier-Stokes flows. Performance comparisons between runs of the MDS CORBA component and the original MDS legacy code on a cluster of workstations and on a parallel computer are also presented. Wrapped as CORBA components, these legacy codes can be reused in a distributed computing environment. The first case shows that high performance can be maintained with the wrapped MDS component. The second case shows that a Web user can submit a task to the wrapped CFD component through a Web page without knowing the exact implementation of the component. In this way, a user’s desktop computing environment can be extended to a high performance computing environment using a cluster of workstations or a parallel computer

Brunel University Research Archive

Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking

Author: Berriman G. Bruce
Deelman Ewa
Good John
Jacob Joseph C.
Katz Daniel S.
Kesselman Carl
Laity Anastasia C.
Prince Thomas A.
Singh Gurmeet
Su Mei-Hui
Williams Roy
Publication venue
Publication date: 01/01/2009
Field of study

Montage is a portable software toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, location and size on the sky, coordinate system and projection, and spatial sampling rate. Many astronomical datasets are massive, and are stored in distributed archives that are, in most cases, remote with respect to the available computational resources. Montage can be run on both single- and multi-processor computers, including clusters and grids. Standard grid tools are used to run Montage in the case where the data or computers used to construct a mosaic are located remotely on the Internet. This paper describes the architecture, algorithms, and usage of Montage as both a software toolkit and as a grid portal. Timing results are provided to show how Montage performance scales with number of processors on a cluster computer. In addition, we compare the performance of two methods of running Montage in parallel on a grid.Comment: 16 pages, 11 figure

arXiv.org e-Print Archive

Crossref

A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids

Author: de Supinski B.
Foster I.
Gropp W.
Karonis N. T.
Lusk E.
Publication venue
Publication date: 01/01/2002
Field of study

The efficient implementation of collective communiction operations has received much attention. Initial efforts produced "optimal" trees based on network communication models that assumed equal point-to-point latencies between any two processes. This assumption is violated in most practical settings, however, particularly in heterogeneous systems such as clusters of SMPs and wide-area "computational Grids," with the result that collective operations perform suboptimally. In response, more recent work has focused on creating topology-aware trees for collective operations that minimize communication across slower channels (e.g., a wide-area network). While these efforts have significant communication benefits, they all limit their view of the network to only two layers. We present a strategy based upon a multilayer view of the network. By creating multilevel topology-aware trees we take advantage of communication cost differences at every level in the network. We used this strategy to implement topology-aware versions of several MPI collective operations in MPICH-G2, the Globus Toolkit[tm]-enabled version of the popular MPICH implementation of the MPI standard. Using information about topology provided by MPICH-G2, we construct these multilevel topology-aware trees automatically during execution. We present results demonstrating the advantages of our multilevel approach by comparing it to the default (topology-unaware) implementation provided by MPICH and a topology-aware two-layer implementation.Comment: 16 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

EGI user forum 2011 : book of abstracts

Author
Publication venue
Publication date: 01/01/2011
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Checkpointing as a Service in Heterogeneous Cloud Environments

Author: Cao Jiajun
Cooperman Gene
Morin Christine
Simonin Matthieu
Publication venue
Publication date: 07/11/2014
Field of study

A non-invasive, cloud-agnostic approach is demonstrated for extending existing cloud platforms to include checkpoint-restart capability. Most cloud platforms currently rely on each application to provide its own fault tolerance. A uniform mechanism within the cloud itself serves two purposes: (a) direct support for long-running jobs, which would otherwise require a custom fault-tolerant mechanism for each application; and (b) the administrative capability to manage an over-subscribed cloud by temporarily swapping out jobs when higher priority jobs arrive. An advantage of this uniform approach is that it also supports parallel and distributed computations, over both TCP and InfiniBand, thus allowing traditional HPC applications to take advantage of an existing cloud infrastructure. Additionally, an integrated health-monitoring mechanism detects when long-running jobs either fail or incur exceptionally low performance, perhaps due to resource starvation, and proactively suspends the job. The cloud-agnostic feature is demonstrated by applying the implementation to two very different cloud platforms: Snooze and OpenStack. The use of a cloud-agnostic architecture also enables, for the first time, migration of applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1