7 research outputs found

    MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface

    Full text link
    Application development for distributed computing "Grids" can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues, we have developed MPICH-G2, a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same commands that would be used on a parallel computer. This library extends the Argonne MPICH implementation of MPI to use services provided by the Globus Toolkit for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance-critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, both network topology and network quality-of-service mechanisms. We describe the MPICH-G2 design and implementation, present performance results, and review application experiences, including record-setting distributed simulations.Comment: 20 pages, 8 figure

    Dynamic Load Balancing of Samr Applications on Distributed Systems

    Get PDF

    Technologies and tools for high-performance distributed computing. Final report

    Full text link

    Vcluster: A Portable Virtual Computing Library For Cluster Computing

    Get PDF
    Message passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java programming language, with its features like object oriented architecture, platform independent bytecode, and native support for multithreading, makes it an alternative language for cluster computing. This research presents a new parallel programming model and a library called VCluster that implements this model on top of a Java Virtual Machine (JVM). The programming model is based on virtual migrating threads to support clusters of heterogeneous SMP machines efficiently. VCluster is implemented in 100% Java, utilizing the portability of Java to address the problems of heterogeneous machines. VCluster virtualizes computational and communication resources such as threads, computation states, and communication channels across multiple separate JVMs, which makes a mobile thread possible. Equipped with virtual migrating thread, it is feasible to balance the load of computing resources dynamically. Several large scale parallel applications have been developed using VCluster to compare the performance and usage of VCluster with other libraries. The results of the experiments show that VCluster makes it easier to develop multithreading parallel applications compared to conventional libraries like MPI. At the same time, the performance of VCluster is comparable to MPICH, a widely used MPI library, combined with popular threading libraries like POSIX Thread and OpenMP. In the next phase of our work, we implemented thread group and thread migration to demonstrate the feasibility of dynamic load balancing in VCluster. We carried out experiments to show that the load can be dynamically balanced in VCluster, resulting in a better performance. Thread group also makes it possible to implement collective communication functions between threads, which have been proved to be useful in process based libraries

    Estrategias de descomposici贸n en dominios para entornos Grid

    Get PDF
    En este trabajo estamos interesados en realizar simulaciones num茅ricas basadas en elementos finitos con integraci贸n expl铆cita en el tiempo utilizando la tecnolog铆a Grid.Actualmente, las simulaciones expl铆citas de elementos finitos usan la t茅cnica de descomposici贸n en dominios con particiones balanceadas para realizar la distribuci贸n de los datos. Sin embargo, esta distribuci贸n de los datos presenta una degradaci贸n importante del rendimiento de las simulaciones expl铆citas cuando son ejecutadas en entornos Grid. Esto se debe principalmente, a que en un ambiente Grid tenemos comunicaciones heterog茅neas, muy r谩pidas dentro de una m谩quina y muy lentas fuera de ella. De esta forma, una distribuci贸n balanceada de los datos se ejecuta a la velocidad de las comunicaciones m谩s lentas. Para superar este problema proponemos solapar el tiempo de la comunicaci贸n remota con el tiempo de c谩lculo. Para ello, dedicaremos algunos procesadores a gestionar las comunicaciones m谩s lentas, y el resto, a realizar c谩lculo intensivo. Este esquema de distribuci贸n de los datos, requiere que la descomposici贸n en dominios sea no balanceada, para que, los procesadores dedicados a realizar la gesti贸n de las comunicaciones lentas tengan apenas carga computacional. En este trabajo se han propuesto y analizado diferentes estrategias para distribuir los datos y mejorar el rendimiento de las aplicaciones en entornos Grid. Las estrategias de distribuci贸n est谩ticas analizadas son: 1. U-1domains: Inicialmente, el dominio de los datos es dividido proporcionalmente entre las m谩quinas dependiendo de su velocidad relativa. Posteriormente, en cada m谩quina, los datos son divididos en nprocs-1 partes, donde nprocs es el n煤mero de procesadores total de la m谩quina. Cada subdominio es asignado a un procesador y cada m谩quina dispone de un 煤nico procesador para gestionar las comunicaciones remotas con otras m谩quinas. 2. U-Bdomains: El particionamiento de los datos se realiza en dos fases. La primera fase es equivalente a la realizada para la distribuci贸n U-1domains. La segunda fase, divide, proporcionalmente, cada subdominio de datos en nprocs-B partes, donde B es el n煤mero de comunicaciones remotas con otras m谩quinas (dominios especiales). Cada m谩quina tiene m谩s de un procesador para gestionar las comunicaciones remotas. 3. U-CBdomains: En esta distribuci贸n, se crean tantos dominios especiales como comunicaciones remotas. Sin embargo, ahora los dominios especiales son asignados a un 煤nico procesador dentro de la m谩quina. De esta forma, cada subdomino de datos es dividido en nprocs-1 partes. La gesti贸n de las comunicaciones remotas se realiza concurrentemente mediante threads. Para evaluar el rendimiento de las aplicaciones sobre entornos Grid utilizamos Dimemas. Para cada caso, evaluamos el rendimiento de las aplicaciones en diferentes entornos y tipos de mallas. Los resultados obtenidos muestran que:路 La distribuci贸n U-1domains reduce los tiempos de ejecuci贸n hasta un 45% respecto a la distribuci贸n balanceada. Sin embargo, esta distribuci贸n no resulta efectiva para entornos Grid compuestos de una gran cantidad de m谩quinas remotas.路 La distribuci贸n U-Bdomains muestra ser m谩s eficiente, ya que reduce el tiempo de ejecuci贸n hasta un 53%. Sin embargo, la escalabilidad de 茅sta distribuci贸n es moderada, debido a que puede llegar a tener un gran n煤mero de procesadores que no realizan c谩lculo intensivo. Estos procesadores 煤nicamente gestionan las comunicaciones remotas. Como limite s贸lo podemos aplicar esta distribuci贸n si m谩s del 50% de los procesadores en una m谩quina realizan c谩lculo.路 La distribuci贸n U-CBdomains reduce los tiempos de ejecuci贸n hasta 30%, pero no resulta tan efectiva como la distribuci贸n U-Bdomains. Sin embargo, esta distribuci贸n incrementa la utilizaci贸n de los procesadores en 50%, es decir que disminuye los procesadores ociosos

    Mesh Partitioning for Distributed Systems

    No full text
    Distributed systems, which consist of a collection of high performance systems interconnected via high performance networks (e.g. ATM), are becoming feasible platforms for execution of large-scale, complex problems. In this paper, we address various issues related to mesh partitioning for distributed systems. These issues include the metric used to compare different partitions, efficiency of the application executing on a distributed system, the number of cut sets, and the advantage of exploiting heterogeneity in network performance. We present a tool called PART, for automatic mesh partitioning for distributed systems. The novel feature of PART is that it considers heterogeneities in the application and the distributed system. The heterogeneities in the distributed system include processor and network performance; the heterogeneities in the application include computational complexities. Preliminary results are presented for partitioning regular and irregular finite element meshes for..

    Mesh Partitioning for Distributed Systems: Exploring Optimal Number of Partitions with Local and Remote Communication

    No full text
    Mesh partitioning for distributed systems differs from partitioning for homogeneous systems in that both system and application heterogeneities need to be taken into consideration. In this paper, we focus on the issue of optimal number of partitions with local and remote communication. This is an important issue due to the fact that local and remote communication times generally differ by orders of magnitude in terms of latency. Our theoretical analyses indicate when this difference is large, it is advantageous to decrease the number of messages sent remotely, thereby decreasing latency. Further, as the number of processor requiring remote communication decreases, more processors can be used to retrofit the partition, which aids reducing execution time. We use an explicit finite element application and four irregular meshes to experiment on two geographically distributed IBM SP machines. Our results using our tool, PART, show that by forcing remote communication on one processor, the t..
    corecore