21 research outputs found

    A Decentralized and Fault Tolerant Convergence Detection Algorithm for Asynchronous Iterative Algorithms

    No full text
    International audienceThis article presents an algorithm that performs a decentralized detection of the global convergence of parallel asynchronous iterative applications. This algorithm is fault tolerant. It runs a decentralized saving procedure which enables this algorithm, after a node's crash, to replace the dead node by a new one which will continue the computing task from the last check point. Combined with the advantages of the asynchronous iteration model, this method allows us to compute very large scale problems using highly volatile parallel architectures like Peer-to-Peer and distributed clusters architectures. We also present the implementation of this algorithm in the JaceP2P platform which is dedicated to designing and executing parallel asynchronous iterative applications in volatile environments. Numerous experiments show the robustness and the efficiency of our algorithm

    A Decentralized and Fault Tolerant Convergence Detection Algorithm for Asynchronous Iterative Algorithms

    No full text
    International audienceThis article presents an algorithm that performs a decentralized detection of the global convergence of parallel asynchronous iterative applications. This algorithm is fault tolerant. It runs a decentralized saving procedure which enables this algorithm, after a node's crash, to replace the dead node by a new one which will continue the computing task from the last check point. Combined with the advantages of the asynchronous iteration model, this method allows us to compute very large scale problems using highly volatile parallel architectures like Peer-to-Peer and distributed clusters architectures. We also present the implementation of this algorithm in the JaceP2P platform which is dedicated to designing and executing parallel asynchronous iterative applications in volatile environments. Numerous experiments show the robustness and the efficiency of our algorithm

    Parallel Numerical Asynchronous Iterative Algorithms: Large Scale Experimentations

    No full text
    International audienceThis paper presents many typical problems that are encountered when executing large scale scientific applications over distributed architectures. The causes and effects of these problems are explained and a solution for some classes of scientific applications is also proposed. This solution is the combination of the asynchronous iteration model with JACEP2P-V2 which is a fully decentralized and fault tolerant platform dedicated to executing parallel asynchronous applications over volatile distributed architectures. We explain in detail how our approach deals with each of these problems. Then we present two large scale numerical experiments that prove the efficiency and the robustness of our approach

    A Parallel Algorithm to Solve Large Stiff ODE Systems on Grid Systems

    No full text
    International audienceThis paper introduces a parallel algorithm to solve large stiff ODE systems on distributed clusters, with computing nodes geographically distant from each other. This algorithm is based on the waveform relaxation method coupled with a sequential solver for differential equations systems. With respect to the standard PVODE algorithm (Parallel Variable-coefficient Ordinary Differential Equations solver; Byrne, George, and Hindmars 1999), it drastically reduces the number of messages exchanged between nodes which makes it less sensitive to slow communications. Thus, it is a coarse-grained algorithm well suited for grid environments connected via high latency networks. In this paper, we present various experiments which compare the PVODE solver and our algorithm and which show the benefits brought by this work

    Energy Consumption Reduction for Asynchronous Message Passing Applications

    No full text
    International audienceIt is widely accepted that the asynchronous parallel methods are more suitable than the synchronous ones on a grid architecture. Indeed, they outperform the synchronous methods, because they overlap the communications of the synchronous methods with computations. However, they also usually execute more iterations than the synchronous ones and thus consume more energy. To reduce the energy consumption of the CPUs executing such methods, the Dynamic voltage and frequency scaling technique can be used. It lowers the frequency of a CPU to reduce its energy consumption, but it also decreases its computing power. Therefore, the frequency that gives the best trade-off between energy consumption and performance must be selected. This paper presents a new online frequency selecting algorithm for parallel iterative asynchronous methods running over grids. It selects a vector of frequencies that gives the best trade-off between energy consumption and performance. New energy and performance models were used in this algorithm to predict the execution time and the energy consumption of synchronous, asynchronous, or hybrid iterative applications running over grids. The proposed algorithm was evaluated on the SimGrid simulator. The experiments showed that synchronously applying the proposed algorithm to the asynchronous version of the application reduces on average its energy consumption by 22% and speeds it up by 5.72%. Finally, the proposed algorithm was also compared to a method that uses the well-known energy and delay product and the comparison results showed that it outperforms this method in terms of energy consumption and performance

    Load balancing in cloud computing environments based on adaptive starvation threshold

    No full text
    International audienceClouds provide to users on‐demand access to large computing and storing resources and offer over on premise IT infrastructures many advantages in terms of cost, flexibility, and availability. However, this new paradigm still faces many challenges, and in this paper, we address the load balancing problem. Even though many approaches have been proposed to balance the load among the servers, most of them are too sensitive to the fluctuation in the clouds load and produce unstable systems. In this paper, we propose a new distributed load balancing algorithm, based on adaptive starvation threshold. It tries to balance the load between the servers while minimizing the response time of the cloud, maximizing the utilization rate of the servers, decreasing the overall migration cost, and maintaining the stability of the system. The performance of the proposed algorithm was compared to a well‐known load balancing algorithm, inspired from the honey bee behavior (HBB). The experimental results showed that the application of the proposed load balancing algorithm gives considerable performance gains and a significant reduction in number of migrations when compared to the performance of the HBB algorithm

    A parallel algorithm to solve large stiff ODE systems on grid systems

    No full text
    International audienceThis paper introduces a parallel algorithm to solve large stiff ODE systems in a geographically distant cluster environment. This algorithm is based on the coupling of the waveform relaxation concept and the CVODE algorithm. With respect to the standard PVODE algorithm, it allows to drastically reduce the number of messages exchanged between nodes. It is a coarse grained algorithm well suited for distant grid environments connected via high latency networks. In this paper our work consists in analyzing the execution times taken by the PVODE solver and our algorithm and in explaining the benefits brought by this work

    A New Fault-Tolerant Algorithm Based on Replication and Preemptive Migration in Cloud Computing

    No full text
    International audience<span id="ctl00_ctl00_cphMain_cphSection_lblAbstract" class="margin-bottom-10"&gtCloud computing is a promising paradigm thatprovides users higher computation advantages in terms of cost,flexibility, and availability. Nevertheless, with potentiallythousands of connected machines, faults become more frequent.Consequently, fault-tolerant load balancing becomes necessary inorder to optimize resources utilization while ensuring thereliability of the system. Common fault tolerance techniques incloud computing have been proposed in the literature. However, theysuffer from several shortcomings: some fault tolerance techniquesuse checkpoint-recovery which increases the average waiting timeand thus the mean response time. While other models rely on taskreplication which reduces the cloud's efficiency in terms ofresource utilization under variable loads. To address thesedeficiencies, an efficient and adaptive fault tolerant algorithmfor load balancing is proposed. Based on the CloudSim simulator,some series of test-bed scenarios are considered to assess thebehavior of the proposed algorithm.</span&g

    Optimizing the energy consumption of message passing applications with iterations executed over grids

    No full text
    International audienceIn recent years, green computing has become an important topic in the supercomputing research domain. However, the computing platforms are still consuming more and more energy due to the increasing number of nodes composing them. To minimize the operating costs of these platforms many techniques have been used. Dynamic voltage and frequency scaling (DVFS) is one of them. It can be used to reduce the power consumption of the CPU while computing, by lowering its frequency. However, lowering the frequency of a CPU may increase the execution time of an application running on that processor. Therefore, the frequency that gives the best trade-off between the energy consumption and the performance of an application must be selected. In this paper, a new online frequency selecting algorithm for grids, composed of heterogeneous clusters, is presented. It selects the frequencies and tries to give the best trade-off between energy saving and performance degradation, for each node computing the message passing application with iterations. The algorithm has a small overhead and works without training or profiling. It uses a new energy model for message passing applications with iterations running on a grid. The proposed algorithm is evaluated on a real grid, the Grid’5000 platform, while running the NAS parallel benchmarks. The experiments on 16 nodes, distributed on three clusters, show that it reduces on average the energy consumption by 30% while the performance is on average only degraded by 3.2%. Finally, the algorithm is compared to an existing method. The comparison results show that it outperforms the latter in terms of energy consumption reduction and performance
    corecore