2 research outputs found

    A Fault Tolerant Scheduling Algorithm for DAG Applications in Cluster Environments.

    Get PDF
    Abstract. Fault tolerance is an essential requirement in systems running applications which need a technique to continue execution where some system components are subject to failure. In this paper, a fault tolerant task scheduling algorithm is proposed for mapping task graphs to heterogeneous processing nodes in cluster computing systems. The starting point of the algorithm is a DAG representing an application with information about the tasks. This information consists of the execution time of the tasks on the target system processors, communication times between the tasks having data dependencies, and the number of the processor failures (ε) which should be tolerated by the scheduling algorithm. The algorithm is based on the active replication scheme, and it schedules ε+1 replicas of each task to achieve the required fault tolerance. Simulation results show the efficiency of the proposed algorithm in spite of its lower complexity
    corecore