15 research outputs found

    A Replication-Based Fault Tolerance Protocol Using Group Communication for the Grid

    No full text
    4th International Symposium on Parallel and Distributed Processing and Applications, ISPA 2006; Sorrento; Italy; 4 December 2006 through 6 December 2006We describe a replication-based protocol that uses group communication for fault tolerance in the Computational Grid. The Grid is partitioned into a number of clusters and each cluster has a designated coordinator that manages the states of the replicas within its cluster. The coordinators belong to a process group and the proposed protocol ensures the correct sequence of message deliveries to the replicas by the coordinators. Any failing node of the Grid is replaced by an active replica to provide correct continuation of the operation of the application. We show the theoretical framework along with illustrations of the replication protocol and its implementation results and analyze its performance and scalability

    A Dedicated Message Matching Mechanism for Collective Communications

    No full text

    A Cluster-Based Dynamic Load Balancing Middleware Protocol for Grids

    No full text
    European Grid Conference on Advances in Grid Computing - EGC 2005; Amsterdam; Netherlands; 14 February 2005 through 16 February 2005We describe a hierarchical dynamic load balancing protocol for Grids. The Grid consists of clusters and each cluster is represented by a coordinator. Each coordinator first attempts to balance the load in its cluster and if this fails, communicates with the other coordinators to perform transfer or reception of load. This process is repetaed periodically. We show the implementation and analyze the performance and scalability of the proposed protocol
    corecore