481 research outputs found

    Energy-Efficient Fault-Tolerant Scheduling Algorithm for Real-Time Tasks in Cloud-Based 5G Networks

    Full text link
    © 2013 IEEE. Green computing has become a hot issue for both academia and industry. The fifth-generation (5G) mobile networks put forward a high request for energy efficiency and low latency. The cloud radio access network provides efficient resource use, high performance, and high availability for 5G systems. However, hardware and software faults of cloud systems may lead to failure in providing real-time services. Developing fault tolerance technique can efficiently enhance the reliability and availability of real-time cloud services. The core idea of fault-tolerant scheduling algorithm is introducing redundancy to ensure that the tasks can be finished in the case of permanent or transient system failure. Nevertheless, the redundancy incurs extra overhead for cloud systems, which results in considerable energy consumption. In this paper, we focus on the problem of how to reduce the energy consumption when providing fault tolerance. We first propose a novel primary-backup-based fault-tolerant scheduling architecture for real-time tasks in the cloud environment. Based on the architecture, we present an energy-efficient fault-tolerant scheduling algorithm for real-time tasks (EFTR). EFTR adopts a proactive strategy to increase the system processing capacity and employs a rearrangement mechanism to improve the resource utilization. Simulation experiments are conducted on the CloudSim platform to evaluate the feasibility and effectiveness of EFTR. Compared with the existing fault-tolerant scheduling algorithms, EFTR shows excellent performance in energy conservation and task schedulability

    A Fault-Tolerant Scheduling Algorithm using Hybrid Overloading Technology for Dynamic Grouping based Multiprocessor Systems

    Get PDF
    In order to extend the application area of fault-tolerant scheduling algorithm based on hybrid overloading for multiprocessor and increase the fault-tolerant number of processors, we propose a new fault-tolerant scheduling algorithm, which is based on hybrid overloading and dynamic grouping for multiprocessor by combining logic grouping strategy for processors in primary backup overloading and backup backup overloading.This algorithm presents the formalization of the dynamic grouping for processors in fault-tolerant scheduling based on hybrid overloading and enlarges the task number included in overloading task link. In the process of fault-tolerant scheduling the processors are dynamically divided into some groups based on overloading task link, so as to keep good scheduling success ratio and enhance the fault-tolerant performance of processors. Both theoretical analysis and simulation experiment prove this algorithm’s effectiveness respectively

    Fault-tolerant scheduling with dynamic number of replicas in heterogeneous systems

    Full text link
    In the existing studies on fault-tolerant scheduling, the active replication schema makes use of&nbsp;&epsilon; + 1 replicas for each task to tolerate&nbsp;&epsilon; failures. However, in this paper, we show that it does not always lead to a higher reliability with more replicas. Besides, the more replicas implies more resource consumption and higher economic cost. To address this problem, with the target to satisfy the user&rsquo;s reliability requirement with minimum resources, this paper proposes a new fault tolerant scheduling algorithm: MaxRe. In the algorithm, we incorporate the reliability analysis into the active replication schema and the theoretical analysis and experiments prove that the MaxRe algorithm&rsquo;s schedule can certainly satisfy user&rsquo;s reliability requirements. And the MaxRe scheduling algorithm can achieve the corresponding reliability with at most 70% fewer resources than the FTSA algorithm.</p

    Fault-Tolerant scheduling for scientific workflows in cloud environments

    Get PDF
    Executing clustered tasks has proven to be an efficient method to improve the computation of Scientific Workflows (SWf) on clouds. However, clustered tasks has a higher probability of suffering from failures than a single task. Therefore, fault tolerance in cloud computing is extremely essential while running large-scale scientific applications. In this paper, a new heuristic called Cluster based Heterogeneous Earliest Finish Time (CHEFT) algorithm to enhance the scheduling and fault tolerance mechanism for SWf in highly distributed cloud environments is proposed. To mitigate the failure of clustered tasks, this algorithm uses idle-Time of the provisioned resources to resubmit failed clustered tasks for successful execution of SWf. Experimental results show that the proposed algorithm have convincing impact on the SWf executions and also drastically reduce the resource waste compared to existing task replication techniques. A trace based simulation of five real SWf shows that this algorithm is able to sustain unexpected task failures with minimal cost and makespan. © 2017 IEEE

    Fault Tolerant Scheduling of Precedence Task Graphs on Heterogeneous Platforms

    Get PDF
    Fault tolerance and latency are important requirements in several applications which are time critical in nature: such applications require guaranties in terms of latency, even when processors are subject to failures. In this paper, we propose a fault tolerant scheduling heuristic for mapping precedence task graphs on heterogeneous systems. Our approach is based on an active replication scheme, capable of supporting ε\varepsilon arbitrary fail-silent (fail-stop) processor failures, hence valid results will be provided even if ε\varepsilon processors fail. We focus on a bi-criteria approach, where we aim at minimizing the latency given a fixed number of failures supported in the system, or the other way round. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. Experimental results demonstrate that our heuristics, despite their lower complexity, outperform their direct competitor, the FTBAR scheduling algorithm[8].La tolérance aux pannes et la latence sont deux critères importants pour plusieurs applications qui sont critiques par nature. Ce type d’applications exige des garanties en terme de temps de latence, même lorsque les processeurs sont sujets aux pannes. Dans ce rapport, nous proposons une heuristique tolérante aux pannes pour l’ordonnancement de graphes de tâches sur des systèmes hétérogènes. Notre approche est basée sur un mécanisme de réplication active, capable de supporter " pannes arbitraires de type silence sur défaillance. En d’autres termes, des résultats valides seront fournis même si " processeurs tombent en panne. Nous nous concentrons sur une approche bi-critère, où nous avons pour objectif de minimiser le temps de latence pour un nombre donné (fixé) de pannes tolérées dans le système, ou l’inverse. Les principales contributions incluent une faible complexité en temps d’exécution, et une réduction importante du nombre de communications induites par le mécanisme de réplication.Les résultats expérimentaux montrent que notre algorithme, en dépit de sa faible complexité temporelle, est meilleur que son direct compétiteur,l’algorithme FTBA

    Fault Tolerant Scheduling of Partitioned and Grouped Jobs in Grid Computing (FTPG)

    Get PDF
    Computational grids have the potential for solving scientific and large - scale problems using heterogeneous and geographically distributed resources In addition to the challenges of managing and scheduling resources reliable challenges arise because the grid infrastructure is unreliable There are two major problems in Scheduling the Grid 1 Efficient Scheduling of jobs 2 Providing fault tolerance in a reliable manner Most of the existing strategies do not provide fault tolerance There are some algorithms which provide fault tolerance but they do a large amount of redundant computation to provide fault tolerance This paper addresses this issue and minimizes redundant work by using a group level table of data This technique is suitable for partitioning and group scheduling of job

    Cost and fault-tolerant aware resource management for scientific workflows using hybrid instances on clouds

    Get PDF
    Cloud service providers are offering computing resources at a reasonable price as a pay-per-use model. Further, cloud service providers have also introduced different pricing models like spot, blockspot and spotfleet instances that are cost effective and user’s have to go through the bidding to balance the reliability and monetary costs. Henceforth, Scientific Workflows (SWf) that are used to model applications of high throughput, computation and complex large-scale data analysis are significantly adopting these computing resources. Nevertheless, spot instances are terminated when the market spot price exceeds the users bid price. Moreover, failures are inevitable in such a large distributed systems and often pose a challenge to design a fault-tolerant scheduling algorithm for SWf. This paper presents an efficient, low-cost and fault-tolerant scheduling algorithm and a bidding strategy to minimize the

    Restart-Based Fault-Tolerance: System Design and Schedulability Analysis

    Full text link
    Embedded systems in safety-critical environments are continuously required to deliver more performance and functionality, while expected to provide verified safety guarantees. Nonetheless, platform-wide software verification (required for safety) is often expensive. Therefore, design methods that enable utilization of components such as real-time operating systems (RTOS), without requiring their correctness to guarantee safety, is necessary. In this paper, we propose a design approach to deploy safe-by-design embedded systems. To attain this goal, we rely on a small core of verified software to handle faults in applications and RTOS and recover from them while ensuring that timing constraints of safety-critical tasks are always satisfied. Faults are detected by monitoring the application timing and fault-recovery is achieved via full platform restart and software reload, enabled by the short restart time of embedded systems. Schedulability analysis is used to ensure that the timing constraints of critical plant control tasks are always satisfied in spite of faults and consequent restarts. We derive schedulability results for four restart-tolerant task models. We use a simulator to evaluate and compare the performance of the considered scheduling models

    A Survey of Research into Mixed Criticality Systems

    Get PDF
    This survey covers research into mixed criticality systems that has been published since Vestal’s seminal paper in 2007, up until the end of 2016. The survey is organised along the lines of the major research areas within this topic. These include single processor analysis (including fixed priority and EDF scheduling, shared resources and static and synchronous scheduling), multiprocessor analysis, realistic models, and systems issues. The survey also explores the relationship between research into mixed criticality systems and other topics such as hard and soft time constraints, fault tolerant scheduling, hierarchical scheduling, cyber physical systems, probabilistic real-time systems, and industrial safety standards
    • …
    corecore